Message ID | 20210426105300.994615-1-numans@ovn.org |
---|---|
State | Not Applicable |
Headers | show |
Series | [ovs-dev] ofctrl: Don't assert while merging tracked flows. | expand |
Thanks Numan for working on this issue! On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > From: Numan Siddique <numans@ovn.org> > > Presently we do ovs_assert(del_f->installed_flows) in > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > and if 'del_f' is not installed. But there are a couple of scenarios > this can happen: > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't > install the flow (because of the rconn queue is full) and an SB update > caused 'F1' to be removed and re-added as 'F2'. > This seems not possible, because del_f->installed_flow is set whenever a "installed_flow" is added to the installed_flows table, regardless of whether it is sent to OVS or not. > 2. In a single engine run, a flow 'F1' was added to the desired flow, > removed from the desired flow and re-added as 'F2'. This could > happen after the commit [1]. In theory this should not happen either, because if F1 was added and then removed before it was installed, it would have been removed in track_flow_del(): if (!f->installed_flow) { /* If it is not installed yet, simply destroy it. */ desired_flow_destroy(f); return; } In fact this is what the comment is supposed to say, but the comment had a typo (my bad): /* del_f must have been installed, otherwise it should have been * removed during track_flow_add_or_modify. */ s/track_flow_add_or_modify/track_flow_del In practice, if commit [1] did trigger this bt, then I'd look deeper into the logic, but I think removing the assert may just hide the problem. > > The likelihood of (2) seems to be higher with datapath groups enabled. > > The assertion - ovs_assert(del_f->installed_flows) is observed in > few deployments after the commit [1]. This patch addressed this issue > by removing that assertion. Removing this assertion should not have > any side effects as the flow 'F2' will be installed. > > This patch is proposed based on the code inspection and the possibility > that the above mentioned scenarios can happen. The patch doesn't have > any test cases to reproduce these scenarios. > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows with conj actions.") > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > Signed-off-by: Numan Siddique <numans@ovn.org> > --- > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > 1 file changed, 22 insertions(+), 4 deletions(-) > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > index c29c3d180..9e960b034 100644 > --- a/controller/ofctrl.c > +++ b/controller/ofctrl.c > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table *flow_table) > continue; > } > > - /* del_f must have been installed, otherwise it should have been > - * removed during track_flow_add_or_modify. */ > - ovs_assert(del_f->installed_flow); > + if (!del_f->installed_flow) { > + /* del_f must have been installed, otherwise it should have > + * been removed during track_flow_add_or_modify. > + * > + * But however there are a couple of scenarios this may not > + * happen. > + * > + * Scenario 1: A flow was added to the desired flows, but > + * ofctrl_put() couldn't install the flow and > + * an SB update caused the 'del_f' to be removed > + * and re-added as 'f'. > + * > + * Scenario 2: In a single engine run, a flow 'del_f' was > + * added to the desired flow, removed from the > + * desired flow and re-added as 'f'. This could > + * happen after the commit c6c61b4e3462. > + * > + * Treat both the scenarios as valid scenarios and just remove > + * 'del_f' from the hmap - deleted_flows. > + * update_installed_flows_by_track() will install 'f'. > + */ > > - if (!f->installed_flow) { > + } else if (!f->installed_flow) { > /* f is not installed yet. */ > replace_installed_to_desired(del_f->installed_flow, del_f, f); > } else { > -- > 2.29.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote: > > Thanks Numan for working on this issue! > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > > > From: Numan Siddique <numans@ovn.org> > > > > Presently we do ovs_assert(del_f->installed_flows) in > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > > and if 'del_f' is not installed. But there are a couple of scenarios > > this can happen: > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't > > install the flow (because of the rconn queue is full) and an SB update > > caused 'F1' to be removed and re-added as 'F2'. > > > > This seems not possible, because del_f->installed_flow is set whenever a > "installed_flow" is added to the installed_flows table, regardless of > whether it is sent to OVS or not. This can happen if ofctrl_can_put() returns false in which case ovn-controller will not call ofctrl_put() at all. There is a theoretical possibility here for sure. Let me know if you think otherwise. We observed this assertion during the upgrades of openshift to an ovn version and there were a lot of "unreasonable timeout" warnings. The updated OVN version had the commit [1] which led me to think that commit [1] can also cause this. > > > 2. In a single engine run, a flow 'F1' was added to the desired flow, > > removed from the desired flow and re-added as 'F2'. This could > > happen after the commit [1]. > > In theory this should not happen either, because if F1 was added and then > removed before it was installed, it would have been removed in > track_flow_del(): > > if (!f->installed_flow) { > /* If it is not installed yet, simply destroy it. */ > desired_flow_destroy(f); > return; > } > > In fact this is what the comment is supposed to say, but the comment had a > typo (my bad): > > /* del_f must have been installed, otherwise it should have been > * removed during track_flow_add_or_modify. */ > > s/track_flow_add_or_modify/track_flow_del This typo really confused me :). From your comments the scenario 2 seems not possible. But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE. What do you think ? Thanks Numan > > In practice, if commit [1] did trigger this bt, then I'd look deeper into > the logic, but I think removing the assert may just hide the problem. > > > > > The likelihood of (2) seems to be higher with datapath groups enabled. > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > few deployments after the commit [1]. This patch addressed this issue > > by removing that assertion. Removing this assertion should not have > > any side effects as the flow 'F2' will be installed. > > > > This patch is proposed based on the code inspection and the possibility > > that the above mentioned scenarios can happen. The patch doesn't have > > any test cases to reproduce these scenarios. > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows > with conj actions.") > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > Signed-off-by: Numan Siddique <numans@ovn.org> > > --- > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > index c29c3d180..9e960b034 100644 > > --- a/controller/ofctrl.c > > +++ b/controller/ofctrl.c > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table > *flow_table) > > continue; > > } > > > > - /* del_f must have been installed, otherwise it should have > been > > - * removed during track_flow_add_or_modify. */ > > - ovs_assert(del_f->installed_flow); > > + if (!del_f->installed_flow) { > > + /* del_f must have been installed, otherwise it should > have > > + * been removed during track_flow_add_or_modify. > > + * > > + * But however there are a couple of scenarios this may > not > > + * happen. > > + * > > + * Scenario 1: A flow was added to the desired flows, > but > > + * ofctrl_put() couldn't install the flow > and > > + * an SB update caused the 'del_f' to be > removed > > + * and re-added as 'f'. > > + * > > + * Scenario 2: In a single engine run, a flow 'del_f' > was > > + * added to the desired flow, removed from > the > > + * desired flow and re-added as 'f'. This > could > > + * happen after the commit c6c61b4e3462. > > + * > > + * Treat both the scenarios as valid scenarios and just > remove > > + * 'del_f' from the hmap - deleted_flows. > > + * update_installed_flows_by_track() will install 'f'. > > + */ > > > > - if (!f->installed_flow) { > > + } else if (!f->installed_flow) { > > /* f is not installed yet. */ > > replace_installed_to_desired(del_f->installed_flow, > del_f, f); > > } else { > > -- > > 2.29.2 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote: > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote: > > > > Thanks Numan for working on this issue! > > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > Presently we do ovs_assert(del_f->installed_flows) in > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > > > and if 'del_f' is not installed. But there are a couple of scenarios > > > this can happen: > > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't > > > install the flow (because of the rconn queue is full) and an SB update > > > caused 'F1' to be removed and re-added as 'F2'. > > > > > > > This seems not possible, because del_f->installed_flow is set whenever a > > "installed_flow" is added to the installed_flows table, regardless of > > whether it is sent to OVS or not. > > This can happen if ofctrl_can_put() returns false in which case > ovn-controller will > not call ofctrl_put() at all. There is a theoretical possibility here for sure. > > Let me know if you think otherwise. > Oh, sorry that I misunderstood. You are right that ofctrl_put() may return immediately without installing the desired flow F1, but if that happens, F1->installed_flow should be NULL, and then it is the same case as in scenario 2) below: when F1 is deleted, it would be destroyed in track_flow_del(). So we should never end up with a tracked deleted flow with installed_flow being NULL, right? The logic is, in theory, when a "desired but not yet installed" flow is being deleted, we should simply destroy it and remove it from tracking. Maybe we should check if there are other possibilities. > We observed this assertion during the upgrades of openshift to an ovn version > and there were a lot of "unreasonable timeout" warnings. > > The updated OVN version had the commit [1] which led me to think that > commit [1] can also cause this. > > > > > > 2. In a single engine run, a flow 'F1' was added to the desired flow, > > > removed from the desired flow and re-added as 'F2'. This could > > > happen after the commit [1]. > > > > In theory this should not happen either, because if F1 was added and then > > removed before it was installed, it would have been removed in > > track_flow_del(): > > > > if (!f->installed_flow) { > > /* If it is not installed yet, simply destroy it. */ > > desired_flow_destroy(f); > > return; > > } > > > > In fact this is what the comment is supposed to say, but the comment had a > > typo (my bad): > > > > /* del_f must have been installed, otherwise it should have been > > * removed during track_flow_add_or_modify. */ > > > > s/track_flow_add_or_modify/track_flow_del > > This typo really confused me :). From your comments the scenario 2 seems > not possible. > > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE. > What do you think ? > > Thanks > Numan > > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper into > > the logic, but I think removing the assert may just hide the problem. > > > > > > > > The likelihood of (2) seems to be higher with datapath groups enabled. > > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > > few deployments after the commit [1]. This patch addressed this issue > > > by removing that assertion. Removing this assertion should not have > > > any side effects as the flow 'F2' will be installed. > > > > > > This patch is proposed based on the code inspection and the possibility > > > that the above mentioned scenarios can happen. The patch doesn't have > > > any test cases to reproduce these scenarios. > > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows > > with conj actions.") > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > > Signed-off-by: Numan Siddique <numans@ovn.org> > > > --- > > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > index c29c3d180..9e960b034 100644 > > > --- a/controller/ofctrl.c > > > +++ b/controller/ofctrl.c > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table > > *flow_table) > > > continue; > > > } > > > > > > - /* del_f must have been installed, otherwise it should have > > been > > > - * removed during track_flow_add_or_modify. */ > > > - ovs_assert(del_f->installed_flow); > > > + if (!del_f->installed_flow) { > > > + /* del_f must have been installed, otherwise it should > > have > > > + * been removed during track_flow_add_or_modify. > > > + * > > > + * But however there are a couple of scenarios this may > > not > > > + * happen. > > > + * > > > + * Scenario 1: A flow was added to the desired flows, > > but > > > + * ofctrl_put() couldn't install the flow > > and > > > + * an SB update caused the 'del_f' to be > > removed > > > + * and re-added as 'f'. > > > + * > > > + * Scenario 2: In a single engine run, a flow 'del_f' > > was > > > + * added to the desired flow, removed from > > the > > > + * desired flow and re-added as 'f'. This > > could > > > + * happen after the commit c6c61b4e3462. > > > + * > > > + * Treat both the scenarios as valid scenarios and just > > remove > > > + * 'del_f' from the hmap - deleted_flows. > > > + * update_installed_flows_by_track() will install 'f'. > > > + */ > > > > > > - if (!f->installed_flow) { > > > + } else if (!f->installed_flow) { > > > /* f is not installed yet. */ > > > replace_installed_to_desired(del_f->installed_flow, > > del_f, f); > > > } else { > > > -- > > > 2.29.2 > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote: > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote: > > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote: > > > > > > Thanks Numan for working on this issue! > > > > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > Presently we do ovs_assert(del_f->installed_flows) in > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > > > > and if 'del_f' is not installed. But there are a couple of scenarios > > > > this can happen: > > > > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() > couldn't > > > > install the flow (because of the rconn queue is full) and an SB update > > > > caused 'F1' to be removed and re-added as 'F2'. > > > > > > > > > > This seems not possible, because del_f->installed_flow is set whenever a > > > "installed_flow" is added to the installed_flows table, regardless of > > > whether it is sent to OVS or not. > > > > This can happen if ofctrl_can_put() returns false in which case > > ovn-controller will > > not call ofctrl_put() at all. There is a theoretical possibility here > for sure. > > > > Let me know if you think otherwise. > > > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may return > immediately without installing the desired flow F1, but if that happens, > F1->installed_flow should be NULL, and then it is the same case as in > scenario 2) below: when F1 is deleted, it would be destroyed in > track_flow_del(). So we should never end up with a tracked deleted flow > with installed_flow being NULL, right? The logic is, in theory, when a > "desired but not yet installed" flow is being deleted, we should simply > destroy it and remove it from tracking. Maybe we should check if there are > other possibilities. You're right. F1 should be destroyed in track_flow_del(). I got access to the core dump and I can see below details of "f" and "del_f". Please let me know if this rings any bell ----- (gdb) frame 6 #6 0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 283 hmap_expand_at(hmap, where); (gdb) print del_f $1 = (struct desired_flow *) 0x556948c670b0 (gdb) print *del_f $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540, 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts = 0x556948d96460, ofpacts_len = 432, cookie = 1418516517}, match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev = 0x556948c67100, next = 0x556948c67100}, references = {prev = 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0, installed_ref_list_node = {prev = 0x5569495bcd90, next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88, next = 0x556948d2bfc8}, is_deleted = true} (gdb) print f $7 = (struct desired_flow *) 0x556948d2bf40 (gdb) print *f $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0, 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts = 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517}, match_hmap_node = {hash = 2338012780, next = 0x556948da8070}, list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow = 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8, next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138, next = 0x556948e68658}, is_deleted = false} ------ Looks like there are some issues with physical flow handling. Seems to me an installed flow F1 is deleted and in the same loop, F2 is added to the tracked flow which has the same match, actions and cookie. Table id is 37 which is OFTABLE_REMOTE_OUTPUT. The SB resource can be either port binding or multicast group. Thanks for the comments so far. I will continue debugging. This patch can be dropped. Numan > > > We observed this assertion during the upgrades of openshift to an ovn > version > > and there were a lot of "unreasonable timeout" warnings. > > > > The updated OVN version had the commit [1] which led me to think that > > commit [1] can also cause this. > > > > > > > > > 2. In a single engine run, a flow 'F1' was added to the desired flow, > > > > removed from the desired flow and re-added as 'F2'. This could > > > > happen after the commit [1]. > > > > > > In theory this should not happen either, because if F1 was added and > then > > > removed before it was installed, it would have been removed in > > > track_flow_del(): > > > > > > if (!f->installed_flow) { > > > /* If it is not installed yet, simply destroy it. */ > > > desired_flow_destroy(f); > > > return; > > > } > > > > > > In fact this is what the comment is supposed to say, but the comment > had a > > > typo (my bad): > > > > > > /* del_f must have been installed, otherwise it should have > been > > > * removed during track_flow_add_or_modify. */ > > > > > > s/track_flow_add_or_modify/track_flow_del > > > > This typo really confused me :). From your comments the scenario 2 seems > > not possible. > > > > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE. > > What do you think ? > > > > Thanks > > Numan > > > > > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper > into > > > the logic, but I think removing the assert may just hide the problem. > > > > > > > > > > > The likelihood of (2) seems to be higher with datapath groups enabled. > > > > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > > > few deployments after the commit [1]. This patch addressed this issue > > > > by removing that assertion. Removing this assertion should not have > > > > any side effects as the flow 'F2' will be installed. > > > > > > > > This patch is proposed based on the code inspection and the > possibility > > > > that the above mentioned scenarios can happen. The patch doesn't have > > > > any test cases to reproduce these scenarios. > > > > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing > flows > > > with conj actions.") > > > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > > > Signed-off-by: Numan Siddique <numans@ovn.org> > > > > --- > > > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > > index c29c3d180..9e960b034 100644 > > > > --- a/controller/ofctrl.c > > > > +++ b/controller/ofctrl.c > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct > ovn_desired_flow_table > > > *flow_table) > > > > continue; > > > > } > > > > > > > > - /* del_f must have been installed, otherwise it should > have > > > been > > > > - * removed during track_flow_add_or_modify. */ > > > > - ovs_assert(del_f->installed_flow); > > > > + if (!del_f->installed_flow) { > > > > + /* del_f must have been installed, otherwise it > should > > > have > > > > + * been removed during track_flow_add_or_modify. > > > > + * > > > > + * But however there are a couple of scenarios this > may > > > not > > > > + * happen. > > > > + * > > > > + * Scenario 1: A flow was added to the desired > flows, > > > but > > > > + * ofctrl_put() couldn't install the > flow > > > and > > > > + * an SB update caused the 'del_f' to be > > > removed > > > > + * and re-added as 'f'. > > > > + * > > > > + * Scenario 2: In a single engine run, a flow > 'del_f' > > > was > > > > + * added to the desired flow, removed > from > > > the > > > > + * desired flow and re-added as 'f'. > This > > > could > > > > + * happen after the commit c6c61b4e3462. > > > > + * > > > > + * Treat both the scenarios as valid scenarios and > just > > > remove > > > > + * 'del_f' from the hmap - deleted_flows. > > > > + * update_installed_flows_by_track() will install > 'f'. > > > > + */ > > > > > > > > - if (!f->installed_flow) { > > > > + } else if (!f->installed_flow) { > > > > /* f is not installed yet. */ > > > > replace_installed_to_desired(del_f->installed_flow, > > > del_f, f); > > > > } else { > > > > -- > > > > 2.29.2 > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <numans@ovn.org> wrote: > > On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote: > > > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote: > > > > > > > > Thanks Numan for working on this issue! > > > > > > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > > > Presently we do ovs_assert(del_f->installed_flows) in > > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal > > > > > and if 'del_f' is not installed. But there are a couple of scenarios > > > > > this can happen: > > > > > > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() > > couldn't > > > > > install the flow (because of the rconn queue is full) and an SB update > > > > > caused 'F1' to be removed and re-added as 'F2'. > > > > > > > > > > > > > This seems not possible, because del_f->installed_flow is set whenever a > > > > "installed_flow" is added to the installed_flows table, regardless of > > > > whether it is sent to OVS or not. > > > > > > This can happen if ofctrl_can_put() returns false in which case > > > ovn-controller will > > > not call ofctrl_put() at all. There is a theoretical possibility here > > for sure. > > > > > > Let me know if you think otherwise. > > > > > > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may return > > immediately without installing the desired flow F1, but if that happens, > > F1->installed_flow should be NULL, and then it is the same case as in > > scenario 2) below: when F1 is deleted, it would be destroyed in > > track_flow_del(). So we should never end up with a tracked deleted flow > > with installed_flow being NULL, right? The logic is, in theory, when a > > "desired but not yet installed" flow is being deleted, we should simply > > destroy it and remove it from tracking. Maybe we should check if there are > > other possibilities. > > You're right. F1 should be destroyed in track_flow_del(). > > I got access to the core dump and I can see below details of "f" and "del_f". > > Please let me know if this rings any bell > > ----- > (gdb) frame 6 > #6 0x00005569459bfbaa in merge_tracked_flows > (flow_table=0x5569470c5b40) at > /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 > 283 hmap_expand_at(hmap, where); > (gdb) print del_f > $1 = (struct desired_flow *) 0x556948c670b0 > (gdb) print *del_f > $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540, > 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > 0x556948d96460, ofpacts_len = 432, cookie = 1418516517}, > match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev > = 0x556948c67100, next = 0x556948c67100}, references = {prev = > 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0, > installed_ref_list_node = {prev = 0x5569495bcd90, > next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88, > next = 0x556948d2bfc8}, is_deleted = true} > Looking at the content of the del_f, the installed_ref_list_node is not pointing to the field of the flow itself, which means at least it was in some installed_flow's desired_refs list, which means it was installed at least at some point before. Now that the "installed_flow" is NULL, it is possible that the desired flow was installed but then unlinked from the installed flow. But by reviewing the code I didn't find any path that "unlink_installed_to_desired()" or "replace_installed_to_desired()" could happen to this flow before this point. If it is unlinked it should also be removed from the track list and also from the deleted_flows hmap and destroyed so it should never be found again. Of course it is also possible that this desired_flow data structure is corrupted so the fields nstalled_ref_list_node and installed_flow just have inconsistent data. I will keep looking into the code. At the same time, is this reproducible? We could add some more logs to help debugging if this happens again. > > (gdb) print f > $7 = (struct desired_flow *) 0x556948d2bf40 > > (gdb) print *f > $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0, > 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517}, > match_hmap_node = {hash = 2338012780, next = 0x556948da8070}, > list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references > = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow = > 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8, > next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138, > next = 0x556948e68658}, is_deleted = false} > > ------ > > Looks like there are some issues with physical flow handling. Seems > to me an installed flow F1 is deleted and in the same loop, F2 is > added > to the tracked flow which has the same match, actions and cookie. > > Table id is 37 which is OFTABLE_REMOTE_OUTPUT. The SB resource can be > either port binding or multicast group. > > Thanks for the comments so far. I will continue debugging. This patch > can be dropped. > > Numan > > > > > > We observed this assertion during the upgrades of openshift to an ovn > > version > > > and there were a lot of "unreasonable timeout" warnings. > > > > > > The updated OVN version had the commit [1] which led me to think that > > > commit [1] can also cause this. > > > > > > > > > > > > 2. In a single engine run, a flow 'F1' was added to the desired flow, > > > > > removed from the desired flow and re-added as 'F2'. This could > > > > > happen after the commit [1]. > > > > > > > > In theory this should not happen either, because if F1 was added and > > then > > > > removed before it was installed, it would have been removed in > > > > track_flow_del(): > > > > > > > > if (!f->installed_flow) { > > > > /* If it is not installed yet, simply destroy it. */ > > > > desired_flow_destroy(f); > > > > return; > > > > } > > > > > > > > In fact this is what the comment is supposed to say, but the comment > > had a > > > > typo (my bad): > > > > > > > > /* del_f must have been installed, otherwise it should have > > been > > > > * removed during track_flow_add_or_modify. */ > > > > > > > > s/track_flow_add_or_modify/track_flow_del > > > > > > This typo really confused me :). From your comments the scenario 2 seems > > > not possible. > > > > > > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE. > > > What do you think ? > > > > > > Thanks > > > Numan > > > > > > > > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper > > into > > > > the logic, but I think removing the assert may just hide the problem. > > > > > > > > > > > > > > The likelihood of (2) seems to be higher with datapath groups enabled. > > > > > > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > > > > few deployments after the commit [1]. This patch addressed this issue > > > > > by removing that assertion. Removing this assertion should not have > > > > > any side effects as the flow 'F2' will be installed. > > > > > > > > > > This patch is proposed based on the code inspection and the > > possibility > > > > > that the above mentioned scenarios can happen. The patch doesn't have > > > > > any test cases to reproduce these scenarios. > > > > > > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing > > flows > > > > with conj actions.") > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > > > > Signed-off-by: Numan Siddique <numans@ovn.org> > > > > > --- > > > > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > > > index c29c3d180..9e960b034 100644 > > > > > --- a/controller/ofctrl.c > > > > > +++ b/controller/ofctrl.c > > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct > > ovn_desired_flow_table > > > > *flow_table) > > > > > continue; > > > > > } > > > > > > > > > > - /* del_f must have been installed, otherwise it should > > have > > > > been > > > > > - * removed during track_flow_add_or_modify. */ > > > > > - ovs_assert(del_f->installed_flow); > > > > > + if (!del_f->installed_flow) { > > > > > + /* del_f must have been installed, otherwise it > > should > > > > have > > > > > + * been removed during track_flow_add_or_modify. > > > > > + * > > > > > + * But however there are a couple of scenarios this > > may > > > > not > > > > > + * happen. > > > > > + * > > > > > + * Scenario 1: A flow was added to the desired > > flows, > > > > but > > > > > + * ofctrl_put() couldn't install the > > flow > > > > and > > > > > + * an SB update caused the 'del_f' to be > > > > removed > > > > > + * and re-added as 'f'. > > > > > + * > > > > > + * Scenario 2: In a single engine run, a flow > > 'del_f' > > > > was > > > > > + * added to the desired flow, removed > > from > > > > the > > > > > + * desired flow and re-added as 'f'. > > This > > > > could > > > > > + * happen after the commit c6c61b4e3462. > > > > > + * > > > > > + * Treat both the scenarios as valid scenarios and > > just > > > > remove > > > > > + * 'del_f' from the hmap - deleted_flows. > > > > > + * update_installed_flows_by_track() will install > > 'f'. > > > > > + */ > > > > > > > > > > - if (!f->installed_flow) { > > > > > + } else if (!f->installed_flow) { > > > > > /* f is not installed yet. */ > > > > > replace_installed_to_desired(del_f->installed_flow, > > > > del_f, f); > > > > > } else { > > > > > -- > > > > > 2.29.2 > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On Wed, May 5, 2021 at 2:25 AM Han Zhou <hzhou@ovn.org> wrote: > > On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <numans@ovn.org> wrote: > > > > On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote: > > > > > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote: > > > > > > > > > > Thanks Numan for working on this issue! > > > > > > > > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote: > > > > > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > > > > > Presently we do ovs_assert(del_f->installed_flows) in > > > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are > equal > > > > > > and if 'del_f' is not installed. But there are a couple of > scenarios > > > > > > this can happen: > > > > > > > > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() > > > couldn't > > > > > > install the flow (because of the rconn queue is full) and an SB > update > > > > > > caused 'F1' to be removed and re-added as 'F2'. > > > > > > > > > > > > > > > > This seems not possible, because del_f->installed_flow is set > whenever a > > > > > "installed_flow" is added to the installed_flows table, regardless > of > > > > > whether it is sent to OVS or not. > > > > > > > > This can happen if ofctrl_can_put() returns false in which case > > > > ovn-controller will > > > > not call ofctrl_put() at all. There is a theoretical possibility here > > > for sure. > > > > > > > > Let me know if you think otherwise. > > > > > > > > > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may > return > > > immediately without installing the desired flow F1, but if that happens, > > > F1->installed_flow should be NULL, and then it is the same case as in > > > scenario 2) below: when F1 is deleted, it would be destroyed in > > > track_flow_del(). So we should never end up with a tracked deleted flow > > > with installed_flow being NULL, right? The logic is, in theory, when a > > > "desired but not yet installed" flow is being deleted, we should simply > > > destroy it and remove it from tracking. Maybe we should check if there > are > > > other possibilities. > > > > You're right. F1 should be destroyed in track_flow_del(). > > > > I got access to the core dump and I can see below details of "f" and > "del_f". > > > > Please let me know if this rings any bell > > > > ----- > > (gdb) frame 6 > > #6 0x00005569459bfbaa in merge_tracked_flows > > (flow_table=0x5569470c5b40) at > > > /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 > > 283 hmap_expand_at(hmap, where); > > (gdb) print del_f > > $1 = (struct desired_flow *) 0x556948c670b0 > > (gdb) print *del_f > > $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > > 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540, > > 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > > 0x556948d96460, ofpacts_len = 432, cookie = 1418516517}, > > match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev > > = 0x556948c67100, next = 0x556948c67100}, references = {prev = > > 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0, > > installed_ref_list_node = {prev = 0x5569495bcd90, > > next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88, > > next = 0x556948d2bfc8}, is_deleted = true} > > > Looking at the content of the del_f, the installed_ref_list_node is not > pointing to the field of the flow itself, which means at least it was in > some installed_flow's desired_refs list, which means it was installed at > least at some point before. Now that the "installed_flow" is NULL, it is > possible that the desired flow was installed but then unlinked from the > installed flow. But by reviewing the code I didn't find any path that > "unlink_installed_to_desired()" or "replace_installed_to_desired()" could > happen to this flow before this point. If it is unlinked it should also be > removed from the track list and also from the deleted_flows hmap and > destroyed so it should never be found again. Of course it is also possible > that this desired_flow data structure is corrupted so the fields > nstalled_ref_list_node and installed_flow just have inconsistent data. I > will keep looking into the code. At the same time, is this reproducible? We > could add some more logs to help debugging if this happens again. > In one of the environments, his issue was seen when upgrading an OCP cluster. and there were some failures related to memory and OOM had kicked in. I am wondering if the issue is related to OOM. In order to address the OOM issues, few kernel patches were required. After addressing these issues, ovn-controller assert is not seen after that. Thanks for looking into the backtraces. Numan > > > > (gdb) print f > > $7 = (struct desired_flow *) 0x556948d2bf40 > > > > (gdb) print *f > > $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow = > > 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0, > > 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts = > > 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517}, > > match_hmap_node = {hash = 2338012780, next = 0x556948da8070}, > > list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references > > = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow = > > 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8, > > next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138, > > next = 0x556948e68658}, is_deleted = false} > > > > ------ > > > > Looks like there are some issues with physical flow handling. Seems > > to me an installed flow F1 is deleted and in the same loop, F2 is > > added > > to the tracked flow which has the same match, actions and cookie. > > > > Table id is 37 which is OFTABLE_REMOTE_OUTPUT. The SB resource can be > > either port binding or multicast group. > > > > Thanks for the comments so far. I will continue debugging. This patch > > can be dropped. > > > > Numan > > > > > > > > > We observed this assertion during the upgrades of openshift to an ovn > > > version > > > > and there were a lot of "unreasonable timeout" warnings. > > > > > > > > The updated OVN version had the commit [1] which led me to think that > > > > commit [1] can also cause this. > > > > > > > > > > > > > > > 2. In a single engine run, a flow 'F1' was added to the desired > flow, > > > > > > removed from the desired flow and re-added as 'F2'. This could > > > > > > happen after the commit [1]. > > > > > > > > > > In theory this should not happen either, because if F1 was added and > > > then > > > > > removed before it was installed, it would have been removed in > > > > > track_flow_del(): > > > > > > > > > > if (!f->installed_flow) { > > > > > /* If it is not installed yet, simply destroy it. */ > > > > > desired_flow_destroy(f); > > > > > return; > > > > > } > > > > > > > > > > In fact this is what the comment is supposed to say, but the comment > > > had a > > > > > typo (my bad): > > > > > > > > > > /* del_f must have been installed, otherwise it should > have > > > been > > > > > * removed during track_flow_add_or_modify. */ > > > > > > > > > > s/track_flow_add_or_modify/track_flow_del > > > > > > > > This typo really confused me :). From your comments the scenario 2 > seems > > > > not possible. > > > > > > > > But I still think scenario 1 can happen if ofctrl_can_put() returns > FALSE. > > > > What do you think ? > > > > > > > > Thanks > > > > Numan > > > > > > > > > > > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper > > > into > > > > > the logic, but I think removing the assert may just hide the > problem. > > > > > > > > > > > > > > > > > The likelihood of (2) seems to be higher with datapath groups > enabled. > > > > > > > > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in > > > > > > few deployments after the commit [1]. This patch addressed this > issue > > > > > > by removing that assertion. Removing this assertion should not > have > > > > > > any side effects as the flow 'F2' will be installed. > > > > > > > > > > > > This patch is proposed based on the code inspection and the > > > possibility > > > > > > that the above mentioned scenarios can happen. The patch doesn't > have > > > > > > any test cases to reproduce these scenarios. > > > > > > > > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood > removing > > > flows > > > > > with conj actions.") > > > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502 > > > > > > Signed-off-by: Numan Siddique <numans@ovn.org> > > > > > > --- > > > > > > controller/ofctrl.c | 26 ++++++++++++++++++++++---- > > > > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > > > > > > index c29c3d180..9e960b034 100644 > > > > > > --- a/controller/ofctrl.c > > > > > > +++ b/controller/ofctrl.c > > > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct > > > ovn_desired_flow_table > > > > > *flow_table) > > > > > > continue; > > > > > > } > > > > > > > > > > > > - /* del_f must have been installed, otherwise it > should > > > have > > > > > been > > > > > > - * removed during track_flow_add_or_modify. */ > > > > > > - ovs_assert(del_f->installed_flow); > > > > > > + if (!del_f->installed_flow) { > > > > > > + /* del_f must have been installed, otherwise it > > > should > > > > > have > > > > > > + * been removed during track_flow_add_or_modify. > > > > > > + * > > > > > > + * But however there are a couple of scenarios > this > > > may > > > > > not > > > > > > + * happen. > > > > > > + * > > > > > > + * Scenario 1: A flow was added to the desired > > > flows, > > > > > but > > > > > > + * ofctrl_put() couldn't install the > > > flow > > > > > and > > > > > > + * an SB update caused the 'del_f' > to be > > > > > removed > > > > > > + * and re-added as 'f'. > > > > > > + * > > > > > > + * Scenario 2: In a single engine run, a flow > > > 'del_f' > > > > > was > > > > > > + * added to the desired flow, > removed > > > from > > > > > the > > > > > > + * desired flow and re-added as 'f'. > > > This > > > > > could > > > > > > + * happen after the commit > c6c61b4e3462. > > > > > > + * > > > > > > + * Treat both the scenarios as valid scenarios > and > > > just > > > > > remove > > > > > > + * 'del_f' from the hmap - deleted_flows. > > > > > > + * update_installed_flows_by_track() will install > > > 'f'. > > > > > > + */ > > > > > > > > > > > > - if (!f->installed_flow) { > > > > > > + } else if (!f->installed_flow) { > > > > > > /* f is not installed yet. */ > > > > > > > replace_installed_to_desired(del_f->installed_flow, > > > > > del_f, f); > > > > > > } else { > > > > > > -- > > > > > > 2.29.2 > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
diff --git a/controller/ofctrl.c b/controller/ofctrl.c index c29c3d180..9e960b034 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table *flow_table) continue; } - /* del_f must have been installed, otherwise it should have been - * removed during track_flow_add_or_modify. */ - ovs_assert(del_f->installed_flow); + if (!del_f->installed_flow) { + /* del_f must have been installed, otherwise it should have + * been removed during track_flow_add_or_modify. + * + * But however there are a couple of scenarios this may not + * happen. + * + * Scenario 1: A flow was added to the desired flows, but + * ofctrl_put() couldn't install the flow and + * an SB update caused the 'del_f' to be removed + * and re-added as 'f'. + * + * Scenario 2: In a single engine run, a flow 'del_f' was + * added to the desired flow, removed from the + * desired flow and re-added as 'f'. This could + * happen after the commit c6c61b4e3462. + * + * Treat both the scenarios as valid scenarios and just remove + * 'del_f' from the hmap - deleted_flows. + * update_installed_flows_by_track() will install 'f'. + */ - if (!f->installed_flow) { + } else if (!f->installed_flow) { /* f is not installed yet. */ replace_installed_to_desired(del_f->installed_flow, del_f, f); } else {