Message ID | 1564737778-27292-1-git-send-email-dceara@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,ovn] ovn-controller: Fix IP engine run with in-flight messages | expand |
On Fri, Aug 2, 2019 at 2:53 PM Dumitru Ceara <dceara@redhat.com> wrote: > When trying to incrementally process changes even if there are in-flight > messages we were incorrectly setting the engine_aborted variable to the > value returned by engine_run. However, engine_run returns true if the > run was completed. > > This was causing discrepancies between logical flows and openflow flows > due to the fact that the rerun wasn't happening after an aborted run. > > In order to avoid confusion the engine_aborted variable is now renamed to > engine_run_done thus avoiding the negated logic. > > CC: Han Zhou <hzhou8@ebay.com> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency caused > by recompute.") > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > I tested this patch multiple times with -j5 and -j10 options. I don't see test failures now. Without this patch, atleast one test fails for me 100% of the time when I run tests with -j5. Tested-by: Numan Siddique <nusiddiq@redhat.com> I didn't get the chance to look into the code changes carefully. Thanks Numan --- > controller/ovn-controller.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index ad33d17..15b9f4e 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) > > uint64_t engine_run_id = 0; > uint64_t old_engine_run_id = 0; > - bool engine_aborted = false; > + bool engine_run_done = true; > > unsigned int ovs_cond_seqno = UINT_MAX; > unsigned int ovnsb_cond_seqno = UINT_MAX; > @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) > * this round of engine_run and continue > processing > * acculated changes incrementally later when > * ofctrl_can_put() returns true. */ > - if (!engine_aborted) { > + if (engine_run_done) { > engine_set_abort_recompute(true); > - engine_aborted = > engine_run(&en_flow_output, > - > ++engine_run_id); > + engine_run_done = > engine_run(&en_flow_output, > + > ++engine_run_id); > } > } else { > engine_set_abort_recompute(false); > - engine_aborted = false; > + engine_run_done = true; > engine_run(&en_flow_output, ++engine_run_id); > } > } > @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) > } > > } > - if (old_engine_run_id == engine_run_id || engine_aborted) { > - if (engine_aborted || engine_need_run(&en_flow_output)) { > + if (old_engine_run_id == engine_run_id || !engine_run_done) { > + if (!engine_run_done || engine_need_run(&en_flow_output)) > { > VLOG_DBG("engine did not run, force recompute next > time: " > "br_int %p, chassis %p", br_int, chassis); > engine_set_force_recompute(true); > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Good catch! Acked-by: Mark Michelson <mmichels@redhat.com> On 8/2/19 5:22 AM, Dumitru Ceara wrote: > When trying to incrementally process changes even if there are in-flight > messages we were incorrectly setting the engine_aborted variable to the > value returned by engine_run. However, engine_run returns true if the > run was completed. > > This was causing discrepancies between logical flows and openflow flows > due to the fact that the rerun wasn't happening after an aborted run. > > In order to avoid confusion the engine_aborted variable is now renamed to > engine_run_done thus avoiding the negated logic. > > CC: Han Zhou <hzhou8@ebay.com> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency caused by recompute.") > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > controller/ovn-controller.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index ad33d17..15b9f4e 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) > > uint64_t engine_run_id = 0; > uint64_t old_engine_run_id = 0; > - bool engine_aborted = false; > + bool engine_run_done = true; > > unsigned int ovs_cond_seqno = UINT_MAX; > unsigned int ovnsb_cond_seqno = UINT_MAX; > @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) > * this round of engine_run and continue processing > * acculated changes incrementally later when > * ofctrl_can_put() returns true. */ > - if (!engine_aborted) { > + if (engine_run_done) { > engine_set_abort_recompute(true); > - engine_aborted = engine_run(&en_flow_output, > - ++engine_run_id); > + engine_run_done = engine_run(&en_flow_output, > + ++engine_run_id); > } > } else { > engine_set_abort_recompute(false); > - engine_aborted = false; > + engine_run_done = true; > engine_run(&en_flow_output, ++engine_run_id); > } > } > @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) > } > > } > - if (old_engine_run_id == engine_run_id || engine_aborted) { > - if (engine_aborted || engine_need_run(&en_flow_output)) { > + if (old_engine_run_id == engine_run_id || !engine_run_done) { > + if (!engine_run_done || engine_need_run(&en_flow_output)) { > VLOG_DBG("engine did not run, force recompute next time: " > "br_int %p, chassis %p", br_int, chassis); > engine_set_force_recompute(true); >
On Fri, Aug 2, 2019 at 2:23 AM Dumitru Ceara <dceara@redhat.com> wrote: > > When trying to incrementally process changes even if there are in-flight > messages we were incorrectly setting the engine_aborted variable to the > value returned by engine_run. However, engine_run returns true if the > run was completed. > > This was causing discrepancies between logical flows and openflow flows > due to the fact that the rerun wasn't happening after an aborted run. > > In order to avoid confusion the engine_aborted variable is now renamed to > engine_run_done thus avoiding the negated logic. > > CC: Han Zhou <hzhou8@ebay.com> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency caused by recompute.") > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > controller/ovn-controller.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index ad33d17..15b9f4e 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) > > uint64_t engine_run_id = 0; > uint64_t old_engine_run_id = 0; > - bool engine_aborted = false; > + bool engine_run_done = true; > > unsigned int ovs_cond_seqno = UINT_MAX; > unsigned int ovnsb_cond_seqno = UINT_MAX; > @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) > * this round of engine_run and continue processing > * acculated changes incrementally later when > * ofctrl_can_put() returns true. */ > - if (!engine_aborted) { > + if (engine_run_done) { > engine_set_abort_recompute(true); > - engine_aborted = engine_run(&en_flow_output, > - ++engine_run_id); > + engine_run_done = engine_run(&en_flow_output, > + ++engine_run_id); > } > } else { > engine_set_abort_recompute(false); > - engine_aborted = false; > + engine_run_done = true; > engine_run(&en_flow_output, ++engine_run_id); > } > } > @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) > } > > } > - if (old_engine_run_id == engine_run_id || engine_aborted) { > - if (engine_aborted || engine_need_run(&en_flow_output)) { > + if (old_engine_run_id == engine_run_id || !engine_run_done) { > + if (!engine_run_done || engine_need_run(&en_flow_output)) { > VLOG_DBG("engine did not run, force recompute next time: " > "br_int %p, chassis %p", br_int, chassis); > engine_set_force_recompute(true); > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Acked-by: Han Zhou <hzhou8@ebay.com> Thanks Dumitru for this critical fix! It was my silly mistake. I tried to reproduce the test case failure you reported, but it still didn't happen on my laptop, even with -j10. Later we may need to add a test case that triggers a lot of flows being installed so that the scenario with in-flight messages are easily covered. The original patch is still waiting to be backported to 2.12. I will send a new version followed by your fix on 2.12.
On Mon, Aug 5, 2019 at 2:11 AM Han Zhou <zhouhan@gmail.com> wrote: > On Fri, Aug 2, 2019 at 2:23 AM Dumitru Ceara <dceara@redhat.com> wrote: > > > > When trying to incrementally process changes even if there are in-flight > > messages we were incorrectly setting the engine_aborted variable to the > > value returned by engine_run. However, engine_run returns true if the > > run was completed. > > > > This was causing discrepancies between logical flows and openflow flows > > due to the fact that the rerun wasn't happening after an aborted run. > > > > In order to avoid confusion the engine_aborted variable is now renamed to > > engine_run_done thus avoiding the negated logic. > > > > CC: Han Zhou <hzhou8@ebay.com> > > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency > caused by recompute.") > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > --- > > controller/ovn-controller.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index ad33d17..15b9f4e 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) > > > > uint64_t engine_run_id = 0; > > uint64_t old_engine_run_id = 0; > > - bool engine_aborted = false; > > + bool engine_run_done = true; > > > > unsigned int ovs_cond_seqno = UINT_MAX; > > unsigned int ovnsb_cond_seqno = UINT_MAX; > > @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) > > * this round of engine_run and continue > processing > > * acculated changes incrementally later > when > > * ofctrl_can_put() returns true. */ > > - if (!engine_aborted) { > > + if (engine_run_done) { > > engine_set_abort_recompute(true); > > - engine_aborted = > engine_run(&en_flow_output, > > - > ++engine_run_id); > > + engine_run_done = > engine_run(&en_flow_output, > > + > ++engine_run_id); > > } > > } else { > > engine_set_abort_recompute(false); > > - engine_aborted = false; > > + engine_run_done = true; > > engine_run(&en_flow_output, > ++engine_run_id); > > } > > } > > @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) > > } > > > > } > > - if (old_engine_run_id == engine_run_id || engine_aborted) { > > - if (engine_aborted || engine_need_run(&en_flow_output)) > { > > + if (old_engine_run_id == engine_run_id || !engine_run_done) > { > > + if (!engine_run_done || > engine_need_run(&en_flow_output)) { > > VLOG_DBG("engine did not run, force recompute next > time: " > > "br_int %p, chassis %p", br_int, chassis); > > engine_set_force_recompute(true); > > -- > > 1.8.3.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Acked-by: Han Zhou <hzhou8@ebay.com> > Thanks. I applied this patch to master. Numan > > Thanks Dumitru for this critical fix! It was my silly mistake. I tried to > reproduce the test case failure you reported, but it still didn't happen on > my laptop, even with -j10. Later we may need to add a test case that > triggers a lot of flows being installed so that the scenario with in-flight > messages are easily covered. > > The original patch is still waiting to be backported to 2.12. I will send a > new version followed by your fix on 2.12. > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Mon, Aug 5, 2019 at 8:52 AM Numan Siddique <nusiddiq@redhat.com> wrote: > > > > On Mon, Aug 5, 2019 at 2:11 AM Han Zhou <zhouhan@gmail.com> wrote: >> >> On Fri, Aug 2, 2019 at 2:23 AM Dumitru Ceara <dceara@redhat.com> wrote: >> > >> > When trying to incrementally process changes even if there are in-flight >> > messages we were incorrectly setting the engine_aborted variable to the >> > value returned by engine_run. However, engine_run returns true if the >> > run was completed. >> > >> > This was causing discrepancies between logical flows and openflow flows >> > due to the fact that the rerun wasn't happening after an aborted run. >> > >> > In order to avoid confusion the engine_aborted variable is now renamed to >> > engine_run_done thus avoiding the negated logic. >> > >> > CC: Han Zhou <hzhou8@ebay.com> >> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency >> caused by recompute.") >> > Signed-off-by: Dumitru Ceara <dceara@redhat.com> >> > --- >> > controller/ovn-controller.c | 14 +++++++------- >> > 1 file changed, 7 insertions(+), 7 deletions(-) >> > >> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> > index ad33d17..15b9f4e 100644 >> > --- a/controller/ovn-controller.c >> > +++ b/controller/ovn-controller.c >> > @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) >> > >> > uint64_t engine_run_id = 0; >> > uint64_t old_engine_run_id = 0; >> > - bool engine_aborted = false; >> > + bool engine_run_done = true; >> > >> > unsigned int ovs_cond_seqno = UINT_MAX; >> > unsigned int ovnsb_cond_seqno = UINT_MAX; >> > @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) >> > * this round of engine_run and continue >> processing >> > * acculated changes incrementally later when >> > * ofctrl_can_put() returns true. */ >> > - if (!engine_aborted) { >> > + if (engine_run_done) { >> > engine_set_abort_recompute(true); >> > - engine_aborted = >> engine_run(&en_flow_output, >> > - >> ++engine_run_id); >> > + engine_run_done = >> engine_run(&en_flow_output, >> > + >> ++engine_run_id); >> > } >> > } else { >> > engine_set_abort_recompute(false); >> > - engine_aborted = false; >> > + engine_run_done = true; >> > engine_run(&en_flow_output, ++engine_run_id); >> > } >> > } >> > @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) >> > } >> > >> > } >> > - if (old_engine_run_id == engine_run_id || engine_aborted) { >> > - if (engine_aborted || engine_need_run(&en_flow_output)) { >> > + if (old_engine_run_id == engine_run_id || !engine_run_done) { >> > + if (!engine_run_done || >> engine_need_run(&en_flow_output)) { >> > VLOG_DBG("engine did not run, force recompute next >> time: " >> > "br_int %p, chassis %p", br_int, chassis); >> > engine_set_force_recompute(true); >> > -- >> > 1.8.3.1 >> > >> > _______________________________________________ >> > dev mailing list >> > dev@openvswitch.org >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> Acked-by: Han Zhou <hzhou8@ebay.com> > > > > Thanks. > I applied this patch to master. > > Numan > >> >> >> Thanks Dumitru for this critical fix! It was my silly mistake. I tried to >> reproduce the test case failure you reported, but it still didn't happen on >> my laptop, even with -j10. Later we may need to add a test case that >> triggers a lot of flows being installed so that the scenario with in-flight >> messages are easily covered. >> >> The original patch is still waiting to be backported to 2.12. I will send a >> new version followed by your fix on 2.12. >> _______________________________________________ Thanks Han, Mark, Numan for reviewing and testing the patch!
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index ad33d17..15b9f4e 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1896,7 +1896,7 @@ main(int argc, char *argv[]) uint64_t engine_run_id = 0; uint64_t old_engine_run_id = 0; - bool engine_aborted = false; + bool engine_run_done = true; unsigned int ovs_cond_seqno = UINT_MAX; unsigned int ovnsb_cond_seqno = UINT_MAX; @@ -1997,14 +1997,14 @@ main(int argc, char *argv[]) * this round of engine_run and continue processing * acculated changes incrementally later when * ofctrl_can_put() returns true. */ - if (!engine_aborted) { + if (engine_run_done) { engine_set_abort_recompute(true); - engine_aborted = engine_run(&en_flow_output, - ++engine_run_id); + engine_run_done = engine_run(&en_flow_output, + ++engine_run_id); } } else { engine_set_abort_recompute(false); - engine_aborted = false; + engine_run_done = true; engine_run(&en_flow_output, ++engine_run_id); } } @@ -2048,8 +2048,8 @@ main(int argc, char *argv[]) } } - if (old_engine_run_id == engine_run_id || engine_aborted) { - if (engine_aborted || engine_need_run(&en_flow_output)) { + if (old_engine_run_id == engine_run_id || !engine_run_done) { + if (!engine_run_done || engine_need_run(&en_flow_output)) { VLOG_DBG("engine did not run, force recompute next time: " "br_int %p, chassis %p", br_int, chassis); engine_set_force_recompute(true);
When trying to incrementally process changes even if there are in-flight messages we were incorrectly setting the engine_aborted variable to the value returned by engine_run. However, engine_run returns true if the run was completed. This was causing discrepancies between logical flows and openflow flows due to the fact that the rerun wasn't happening after an aborted run. In order to avoid confusion the engine_aborted variable is now renamed to engine_run_done thus avoiding the negated logic. CC: Han Zhou <hzhou8@ebay.com> Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency caused by recompute.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- controller/ovn-controller.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)