Message ID | 20201106033641.1848523-5-blp@ovn.org |
---|---|
State | Superseded |
Headers | show |
Series | Add DDlog implementation of ovn-northd | expand |
On 11/6/20 4:36 AM, Ben Pfaff wrote: > Many of these could be replaced by "ovn-nbctl sync". Some weren't > really needed at all because they were adjacent to something that itself > called sync or otherwise used --wait. Some were more appropriately > done with explicit waits for what was really needed. > > I left some "sleep"s. Some were because they were "negative" sleeps: > they were giving time for something to happen that should *not* happen > (in other words, if you wait for it to happen, you'll wait forever, > unless there's a bug). Some were because I didn't know how to properly > wait for what they were waiting for, or because I didn't understand > the circumstances deeply enough. > > Signed-off-by: Ben Pfaff <blp@ovn.org> > --- > tests/ovn-northd.at | 7 ++ > tests/ovn.at | 167 ++++++++++++-------------------------------- > 2 files changed, 52 insertions(+), 122 deletions(-) > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 949a8eee054e..5d670233561e 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ > Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ > health_check @hc > wait_row_count Service_Monitor 2 > +check ovn-nbctl --wait=hv sync This should be "--wait=sb". ovn-controller is not started for tests in ovn-northd.at and "--wait=hv" would return immediately. This applies to all "ovn-nbctl --wait=hv sync" in ovn-northd.at. Thanks, Dumitru
On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote: > On 11/6/20 4:36 AM, Ben Pfaff wrote: > > Many of these could be replaced by "ovn-nbctl sync". Some weren't > > really needed at all because they were adjacent to something that itself > > called sync or otherwise used --wait. Some were more appropriately > > done with explicit waits for what was really needed. > > > > I left some "sleep"s. Some were because they were "negative" sleeps: > > they were giving time for something to happen that should *not* happen > > (in other words, if you wait for it to happen, you'll wait forever, > > unless there's a bug). Some were because I didn't know how to properly > > wait for what they were waiting for, or because I didn't understand > > the circumstances deeply enough. > > > > Signed-off-by: Ben Pfaff <blp@ovn.org> > > --- > > tests/ovn-northd.at | 7 ++ > > tests/ovn.at | 167 ++++++++++++-------------------------------- > > 2 files changed, 52 insertions(+), 122 deletions(-) > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index 949a8eee054e..5d670233561e 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ > > Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ > > health_check @hc > > wait_row_count Service_Monitor 2 > > +check ovn-nbctl --wait=hv sync > > This should be "--wait=sb". ovn-controller is not started for tests in > ovn-northd.at and "--wait=hv" would return immediately. This applies to > all "ovn-nbctl --wait=hv sync" in ovn-northd.at. I see what you mean. This is surprising to me. When I defined this stuff years ago, my intent was that --wait=hv would be stronger than --wait=sb. That is, --wait=sb would cause ovn-nbctl to wait for the southbound database to catch up with the northbound changes, and --wait=hb would cause it to wait for the southbound database AND the hypervisors to catch up. That's even how it's documented for ovn-nbctl: <p> With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it waits for <code>ovn-northd</code> to bring the southbound database up-to-date with the northbound database updates. </p> <p> With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it additionally waits for all OVN chassis (hypervisors and gateways) to become up-to-date with the northbound database updates. (This can become an indefinite wait if any chassis is malfunctioning.) </p> That's not what actually happens, though. As you point out, if there are no hypervisors, then the hypervisors are always up-to-date, even if the database is not. I think this is a bug in ovn-nbctl. Arguably, it's a bug in the database definition (perhaps hv_cfg shouldn't even be filled in but left empty if there are no chassis) but I think it is too late to fix it there. It is, however, easy enough to fix it in ovn-nbctl. (And at the same time, I'll change these in ovn-northd to just say --wait=sb. You have a point.) How about this as an additional patch? -8<--------------------------cut here-------------------------->8-- From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001 From: Ben Pfaff <blp@ovn.org> Date: Mon, 9 Nov 2020 16:12:50 -0800 Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in corner case. In the corner case where there are no chassis, --wait=hv didn't wait at all, instead of waiting for the southbound database. Reported-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> --- ovn-nb.xml | 27 ++++++++++++++++++++------- utilities/ovn-nbctl.c | 2 +- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/ovn-nb.xml b/ovn-nb.xml index 51b186b84419..d3e51f3e5e87 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -84,13 +84,26 @@ </column> <column name="hv_cfg"> - Sequence number that <code>ovn-northd</code> sets to the smallest - sequence number of all the chassis in the system, as reported in the - <code>Chassis_Private</code> table in the southbound database. Thus, - <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are - caught up with the northbound configuration (which may never happen, if - any chassis is down). This value can regress, if a chassis was removed - from the system and rejoins before catching up. + <p> + Sequence number that <code>ovn-northd</code> sets to the smallest + sequence number of all the chassis in the system, as reported in the + <code>Chassis_Private</code> table in the southbound database. Thus, + <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are + caught up with the northbound configuration (which may never happen, if + any chassis is down). This value can regress, if a chassis was removed + from the system and rejoins before catching up. + </p> + + <p> + If there are no chassis, then <code>ovn-northd</code> copies + <code>nb_cfg</code> to <ref column="hv_cfg"/>. Thus, in this case, + the (nonexistent) hypervisors are always considered to be caught up. + This means that hypervisors can be "caught up" even in cases where + <ref column="sb_cfg"/> would show that the southbound database is + not. To detect when both the hypervisors and the southbound database + are caught up, a client should take the smaller of <ref + column="sb_cfg"/> and <ref column="hv_cfg"/>. + </p> </column> <column name="hv_cfg_timestamp"> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c index 4f28edc808ec..ee9849de53a9 100644 --- a/utilities/ovn-nbctl.c +++ b/utilities/ovn-nbctl.c @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands, NBREC_NB_GLOBAL_FOR_EACH (nb, idl) { int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB ? nb->sb_cfg - : nb->hv_cfg); + : MIN(nb->sb_cfg, nb->hv_cfg)); if (cur_cfg >= next_cfg) { if (print_wait_time) { printf("Time spent on processing nb_cfg %"PRId64":\n",
On 11/10/20 1:26 AM, Ben Pfaff wrote: > On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote: >> On 11/6/20 4:36 AM, Ben Pfaff wrote: >>> Many of these could be replaced by "ovn-nbctl sync". Some weren't >>> really needed at all because they were adjacent to something that itself >>> called sync or otherwise used --wait. Some were more appropriately >>> done with explicit waits for what was really needed. >>> >>> I left some "sleep"s. Some were because they were "negative" sleeps: >>> they were giving time for something to happen that should *not* happen >>> (in other words, if you wait for it to happen, you'll wait forever, >>> unless there's a bug). Some were because I didn't know how to properly >>> wait for what they were waiting for, or because I didn't understand >>> the circumstances deeply enough. >>> >>> Signed-off-by: Ben Pfaff <blp@ovn.org> >>> --- >>> tests/ovn-northd.at | 7 ++ >>> tests/ovn.at | 167 ++++++++++++-------------------------------- >>> 2 files changed, 52 insertions(+), 122 deletions(-) >>> >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index 949a8eee054e..5d670233561e 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ >>> Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ >>> health_check @hc >>> wait_row_count Service_Monitor 2 >>> +check ovn-nbctl --wait=hv sync >> >> This should be "--wait=sb". ovn-controller is not started for tests in >> ovn-northd.at and "--wait=hv" would return immediately. This applies to >> all "ovn-nbctl --wait=hv sync" in ovn-northd.at. > > I see what you mean. > > This is surprising to me. When I defined this stuff years ago, my > intent was that --wait=hv would be stronger than --wait=sb. That is, > --wait=sb would cause ovn-nbctl to wait for the southbound database to > catch up with the northbound changes, and --wait=hb would cause it to > wait for the southbound database AND the hypervisors to catch up. > > That's even how it's documented for ovn-nbctl: > > <p> > With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it > waits for <code>ovn-northd</code> to bring the southbound database > up-to-date with the northbound database updates. > </p> > > <p> > With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it > additionally waits for all OVN chassis (hypervisors and gateways) to > become up-to-date with the northbound database updates. (This can > become an indefinite wait if any chassis is malfunctioning.) > </p> > > That's not what actually happens, though. As you point out, if there > are no hypervisors, then the hypervisors are always up-to-date, even if > the database is not. > > I think this is a bug in ovn-nbctl. Arguably, it's a bug in the > database definition (perhaps hv_cfg shouldn't even be filled in but left > empty if there are no chassis) but I think it is too late to fix it > there. It is, however, easy enough to fix it in ovn-nbctl. > > (And at the same time, I'll change these in ovn-northd to just say > --wait=sb. You have a point.) > > How about this as an additional patch? > > -8<--------------------------cut here-------------------------->8-- > > From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001 > From: Ben Pfaff <blp@ovn.org> > Date: Mon, 9 Nov 2020 16:12:50 -0800 > Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in > corner case. > > In the corner case where there are no chassis, --wait=hv didn't wait at > all, instead of waiting for the southbound database. > > Reported-by: Dumitru Ceara <dceara@redhat.com> > Signed-off-by: Ben Pfaff <blp@ovn.org> > --- > ovn-nb.xml | 27 ++++++++++++++++++++------- > utilities/ovn-nbctl.c | 2 +- > 2 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/ovn-nb.xml b/ovn-nb.xml > index 51b186b84419..d3e51f3e5e87 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -84,13 +84,26 @@ > </column> > > <column name="hv_cfg"> > - Sequence number that <code>ovn-northd</code> sets to the smallest > - sequence number of all the chassis in the system, as reported in the > - <code>Chassis_Private</code> table in the southbound database. Thus, > - <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are > - caught up with the northbound configuration (which may never happen, if > - any chassis is down). This value can regress, if a chassis was removed > - from the system and rejoins before catching up. > + <p> > + Sequence number that <code>ovn-northd</code> sets to the smallest > + sequence number of all the chassis in the system, as reported in the > + <code>Chassis_Private</code> table in the southbound database. Thus, > + <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are > + caught up with the northbound configuration (which may never happen, if > + any chassis is down). This value can regress, if a chassis was removed > + from the system and rejoins before catching up. > + </p> > + > + <p> > + If there are no chassis, then <code>ovn-northd</code> copies > + <code>nb_cfg</code> to <ref column="hv_cfg"/>. Thus, in this case, > + the (nonexistent) hypervisors are always considered to be caught up. > + This means that hypervisors can be "caught up" even in cases where > + <ref column="sb_cfg"/> would show that the southbound database is > + not. To detect when both the hypervisors and the southbound database > + are caught up, a client should take the smaller of <ref > + column="sb_cfg"/> and <ref column="hv_cfg"/>. > + </p> > </column> > > <column name="hv_cfg_timestamp"> > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > index 4f28edc808ec..ee9849de53a9 100644 > --- a/utilities/ovn-nbctl.c > +++ b/utilities/ovn-nbctl.c > @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands, > NBREC_NB_GLOBAL_FOR_EACH (nb, idl) { > int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB > ? nb->sb_cfg > - : nb->hv_cfg); > + : MIN(nb->sb_cfg, nb->hv_cfg)); > if (cur_cfg >= next_cfg) { > if (print_wait_time) { > printf("Time spent on processing nb_cfg %"PRId64":\n", > Looks good to me, thanks!
On Tue, Nov 10, 2020 at 09:47:56AM +0100, Dumitru Ceara wrote: > On 11/10/20 1:26 AM, Ben Pfaff wrote: > > On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote: > >> On 11/6/20 4:36 AM, Ben Pfaff wrote: > >>> Many of these could be replaced by "ovn-nbctl sync". Some weren't > >>> really needed at all because they were adjacent to something that itself > >>> called sync or otherwise used --wait. Some were more appropriately > >>> done with explicit waits for what was really needed. > >>> > >>> I left some "sleep"s. Some were because they were "negative" sleeps: > >>> they were giving time for something to happen that should *not* happen > >>> (in other words, if you wait for it to happen, you'll wait forever, > >>> unless there's a bug). Some were because I didn't know how to properly > >>> wait for what they were waiting for, or because I didn't understand > >>> the circumstances deeply enough. > >>> > >>> Signed-off-by: Ben Pfaff <blp@ovn.org> > >>> --- > >>> tests/ovn-northd.at | 7 ++ > >>> tests/ovn.at | 167 ++++++++++++-------------------------------- > >>> 2 files changed, 52 insertions(+), 122 deletions(-) > >>> > >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > >>> index 949a8eee054e..5d670233561e 100644 > >>> --- a/tests/ovn-northd.at > >>> +++ b/tests/ovn-northd.at > >>> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ > >>> Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ > >>> health_check @hc > >>> wait_row_count Service_Monitor 2 > >>> +check ovn-nbctl --wait=hv sync > >> > >> This should be "--wait=sb". ovn-controller is not started for tests in > >> ovn-northd.at and "--wait=hv" would return immediately. This applies to > >> all "ovn-nbctl --wait=hv sync" in ovn-northd.at. > > > > I see what you mean. > > > > This is surprising to me. When I defined this stuff years ago, my > > intent was that --wait=hv would be stronger than --wait=sb. That is, > > --wait=sb would cause ovn-nbctl to wait for the southbound database to > > catch up with the northbound changes, and --wait=hb would cause it to > > wait for the southbound database AND the hypervisors to catch up. > > > > That's even how it's documented for ovn-nbctl: > > > > <p> > > With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it > > waits for <code>ovn-northd</code> to bring the southbound database > > up-to-date with the northbound database updates. > > </p> > > > > <p> > > With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it > > additionally waits for all OVN chassis (hypervisors and gateways) to > > become up-to-date with the northbound database updates. (This can > > become an indefinite wait if any chassis is malfunctioning.) > > </p> > > > > That's not what actually happens, though. As you point out, if there > > are no hypervisors, then the hypervisors are always up-to-date, even if > > the database is not. > > > > I think this is a bug in ovn-nbctl. Arguably, it's a bug in the > > database definition (perhaps hv_cfg shouldn't even be filled in but left > > empty if there are no chassis) but I think it is too late to fix it > > there. It is, however, easy enough to fix it in ovn-nbctl. > > > > (And at the same time, I'll change these in ovn-northd to just say > > --wait=sb. You have a point.) > > > > How about this as an additional patch? > > > > -8<--------------------------cut here-------------------------->8-- > > > > From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001 > > From: Ben Pfaff <blp@ovn.org> > > Date: Mon, 9 Nov 2020 16:12:50 -0800 > > Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in > > corner case. > > > > In the corner case where there are no chassis, --wait=hv didn't wait at > > all, instead of waiting for the southbound database. > > > > Reported-by: Dumitru Ceara <dceara@redhat.com> > > Signed-off-by: Ben Pfaff <blp@ovn.org> > > --- > > ovn-nb.xml | 27 ++++++++++++++++++++------- > > utilities/ovn-nbctl.c | 2 +- > > 2 files changed, 21 insertions(+), 8 deletions(-) > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index 51b186b84419..d3e51f3e5e87 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -84,13 +84,26 @@ > > </column> > > > > <column name="hv_cfg"> > > - Sequence number that <code>ovn-northd</code> sets to the smallest > > - sequence number of all the chassis in the system, as reported in the > > - <code>Chassis_Private</code> table in the southbound database. Thus, > > - <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are > > - caught up with the northbound configuration (which may never happen, if > > - any chassis is down). This value can regress, if a chassis was removed > > - from the system and rejoins before catching up. > > + <p> > > + Sequence number that <code>ovn-northd</code> sets to the smallest > > + sequence number of all the chassis in the system, as reported in the > > + <code>Chassis_Private</code> table in the southbound database. Thus, > > + <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are > > + caught up with the northbound configuration (which may never happen, if > > + any chassis is down). This value can regress, if a chassis was removed > > + from the system and rejoins before catching up. > > + </p> > > + > > + <p> > > + If there are no chassis, then <code>ovn-northd</code> copies > > + <code>nb_cfg</code> to <ref column="hv_cfg"/>. Thus, in this case, > > + the (nonexistent) hypervisors are always considered to be caught up. > > + This means that hypervisors can be "caught up" even in cases where > > + <ref column="sb_cfg"/> would show that the southbound database is > > + not. To detect when both the hypervisors and the southbound database > > + are caught up, a client should take the smaller of <ref > > + column="sb_cfg"/> and <ref column="hv_cfg"/>. > > + </p> > > </column> > > > > <column name="hv_cfg_timestamp"> > > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > > index 4f28edc808ec..ee9849de53a9 100644 > > --- a/utilities/ovn-nbctl.c > > +++ b/utilities/ovn-nbctl.c > > @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands, > > NBREC_NB_GLOBAL_FOR_EACH (nb, idl) { > > int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB > > ? nb->sb_cfg > > - : nb->hv_cfg); > > + : MIN(nb->sb_cfg, nb->hv_cfg)); > > if (cur_cfg >= next_cfg) { > > if (print_wait_time) { > > printf("Time spent on processing nb_cfg %"PRId64":\n", > > > > Looks good to me, thanks! Thanks, I applied this bug fix to OVN master.
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 949a8eee054e..5d670233561e 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \ Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \ health_check @hc wait_row_count Service_Monitor 2 +check ovn-nbctl --wait=hv sync ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt AT_CHECK([cat lflows.txt | sed 's/table=..//'], [0], [dnl @@ -1139,6 +1140,7 @@ OVS_WAIT_FOR_OUTPUT( # Set the service monitor for sw1-p1 to offline check ovn-sbctl set service_monitor sw1-p1 status=offline wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=offline +check ovn-nbctl --wait=hv sync AT_CAPTURE_FILE([sbflows4]) OVS_WAIT_FOR_OUTPUT( @@ -1150,6 +1152,7 @@ OVS_WAIT_FOR_OUTPUT( ovn-sbctl set service_monitor $sm_sw0_p1 status=offline wait_row_count Service_Monitor 1 logical_port=sw0-p1 status=offline +check ovn-nbctl --wait=hv sync AT_CAPTURE_FILE([sbflows5]) OVS_WAIT_FOR_OUTPUT( @@ -1166,6 +1169,7 @@ ovn-sbctl set service_monitor $sm_sw0_p1 status=online ovn-sbctl set service_monitor $sm_sw1_p1 status=online wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=online +check ovn-nbctl --wait=hv sync AT_CAPTURE_FILE([sbflows7]) OVS_WAIT_FOR_OUTPUT( @@ -1176,6 +1180,7 @@ OVS_WAIT_FOR_OUTPUT( # Set the service monitor for sw1-p1 to error ovn-sbctl set service_monitor $sm_sw1_p1 status=error wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=error +check ovn-nbctl --wait=hv sync ovn-sbctl dump-flows sw0 | grep "ip4.dst == 10.0.0.10 && tcp.dst == 80" \ | grep priority=120 > lflows.txt @@ -1214,6 +1219,7 @@ OVS_WAIT_FOR_OUTPUT( check ovn-sbctl set service_monitor sw1-p1 status=online wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=online +check ovn-nbctl --wait=hv sync AT_CAPTURE_FILE([sbflows10]) OVS_WAIT_FOR_OUTPUT( @@ -1244,6 +1250,7 @@ AT_CHECK([ovn-nbctl -- --id=@hc create Load_Balancer_Health_Check vip="10.0.0.10 check ovn-nbctl ls-lb-add sw0 lb2 check ovn-nbctl ls-lb-add sw1 lb2 +check ovn-nbctl --wait=hv sync wait_row_count Service_Monitor 5 diff --git a/tests/ovn.at b/tests/ovn.at index 243e17ad8b16..bb4bb22fef5c 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1790,10 +1790,6 @@ check ovn-nbctl --wait=hv sync # for ARP resolution). OVN_POPULATE_ARP -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 - # Make sure there is no attempt to adding duplicated flows by ovn-controller AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"]) AT_FAIL_IF([test -n "`grep duplicate hv2/ovn-controller.log`"]) @@ -1988,11 +1984,7 @@ done # set address for lp13 with invalid characters. # lp13 should be configured with only 192.168.0.13. -ovn-nbctl lsp-set-addresses lp13 "f0:00:00:00:00:13 192.168.0.13 invalid 192.169.0.13" - -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +ovn-nbctl --wait=hv lsp-set-addresses lp13 "f0:00:00:00:00:13 192.168.0.13 invalid 192.169.0.13" sip=`ip_to_hex 192 168 0 11` tip=`ip_to_hex 192 168 0 13` @@ -2067,7 +2059,11 @@ for i in 1 2; do done done -sleep 1 +# Wait for bindings to take effect. +wait_row_count Port_Binding 1 logical_port=lp11 'encap!=[[]]' +wait_row_count Port_Binding 1 logical_port=lp12 'encap!=[[]]' +wait_row_count Port_Binding 1 logical_port=lp21 'encap!=[[]]' +wait_row_count Port_Binding 1 logical_port=lp22 'encap!=[[]]' # dump port bindings; since we have vxlan and geneve tunnels, we expect the # ports to be bound to geneve tunnels. @@ -2087,9 +2083,7 @@ check_row_count Port_Binding 1 logical_port=lp22 encap=$encap_rec # for ARP resolution). OVN_POPULATE_ARP -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Make sure there is no attempt to adding duplicated flows by ovn-controller AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"]) @@ -3205,9 +3199,7 @@ ovs-vsctl add-port br-phys vif3 -- set Interface vif3 options:tx_pcap=hv3/vif3-t # for ARP resolution). OVN_POPULATE_ARP -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # test_packet INPORT DST SRC ETHTYPE OUTPORT... # @@ -3373,9 +3365,7 @@ ovs-vsctl add-port br-phys vif3 -- set Interface vif3 options:tx_pcap=hv3/vif3-t # for ARP resolution). OVN_POPULATE_ARP -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # test_packet INPORT DST SRC ETHTYPE OUTPORT... # @@ -3573,8 +3563,7 @@ check ovn-nbctl --wait=hv sync # for ARP resolution). OVN_POPULATE_ARP -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. +check ovn-nbctl --wait=hv sync # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT... # @@ -3974,8 +3963,7 @@ done OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT... # @@ -4148,8 +4136,7 @@ done OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Given the name of a logical port, prints the name of the hypervisor # on which it is located. @@ -4583,8 +4570,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Packet to send. packet="inport==\"ls1-lp1\" && eth.src==$ls1_lp1_mac && eth.dst==$rp_ls1_mac && @@ -4695,9 +4681,7 @@ ovs-vsctl -- add-port br-int vif2 -- \ ofport-request=1 -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Send ip packets between the two ports. @@ -4729,11 +4713,7 @@ as hv1 ovs-ofctl dump-flows br-int #Disable router R1 -ovn-nbctl set Logical_Router R1 enabled=false - -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +ovn-nbctl --wait=hv set Logical_Router R1 enabled=false echo "---------SB dump-----" ovn-sbctl list datapath_binding @@ -4809,10 +4789,8 @@ ovs-vsctl -- add-port br-int vif2 -- \ options:rxq_pcap=hv1/vif2-rx.pcap \ ofport-request=1 +check ovn-nbctl --wait=hv sync -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 # Send ip packets between the two ports. @@ -4843,7 +4821,7 @@ echo "------ hv1 dump ----------" as hv1 ovs-ofctl dump-flows br-int #Disable router R1 -ovn-nbctl set Logical_Router R1 enabled=false +ovn-nbctl --wait=hv set Logical_Router R1 enabled=false echo "---------SB dump-----" ovn-sbctl list datapath_binding @@ -4854,9 +4832,6 @@ echo "---------------------" echo "------ hv1 dump ----------" as hv1 ovs-ofctl dump-flows br-int -# Allow some time for the disabling of logical router R1 to propagate. -# XXX This should be more systematic. -sleep 1 as hv1 ovs-appctl netdev-dummy/receive vif1 $packet @@ -4961,8 +4936,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Send ip packets between foo1 and alice1 src_mac="f00000010203" @@ -5184,8 +5158,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Send ip packets between foo1 and alice1 src_mac="f00000010203" @@ -5322,7 +5295,7 @@ as hv1 ovs-appctl vlog/set dbg OVN_POPULATE_ARP -sleep 2 +check ovn-nbctl --wait=hv sync as hv1 ovs-vsctl show @@ -5987,7 +5960,7 @@ ovs-vsctl -- add-port br-int hv1-vif5 -- \ OVN_POPULATE_ARP -sleep 2 +check ovn-nbctl --wait=hv sync trim_zeros() { sed 's/\(00\)\{1,\}$//' @@ -6269,10 +6242,7 @@ ovn-nbctl lsp-add foo foo1 \ ovn-nbctl lsp-add alice alice1 \ -- lsp-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2" - -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 2 +check ovn-nbctl --wait=hv sync # Send ip packets between foo1 and alice1 src_mac="f00000010203" @@ -6335,7 +6305,7 @@ ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \ R2 static_routes @lrt # Wait for ovn-controller to catch up. -sleep 1 +check ovn-nbctl --wait=hv sync # Send the packet again. as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet @@ -6407,10 +6377,8 @@ ovs-vsctl -- add-port br-int vif2 -- \ options:rxq_pcap=hv1/vif2-rx.pcap \ ofport-request=1 - # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync for i in 1 2; do @@ -6568,10 +6536,6 @@ ovs-vsctl -- add-port br-int vif3 -- \ options:rxq_pcap=pbr-hv/vif3-rx.pcap \ ofport-request=1 -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 - ls1_ro_mac=00:00:00:01:02:f1 ls1_ro_ip=192.168.1.1 @@ -6758,10 +6722,6 @@ ovs-vsctl -- add-port br-int vif3 -- \ options:rxq_pcap=pbr-hv/vif3-rx.pcap \ ofport-request=1 -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 - ls1_ro_mac=00:00:00:01:02:f1 ls1_ro_ip=2001::1 @@ -7016,14 +6976,11 @@ ovn-nbctl acl-add lsw0 to-lport 1002 'outport == "lp1" && ip6 && icmp6' allow-r ovn-nbctl acl-add lsw0 to-lport 1002 'outport == "lp2" && ip6 && icmp6' allow-related # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. +# XXX The "sleep" here seems to be essential for ovn-northd-ddlog, +# which may indicate that it needs improvement. +check ovn-nbctl --wait=hv sync sleep 1 -# Given the name of a logical port, prints the name of the hypervisor -# on which it is located. -vif_to_hv() { - echo hv1${1%?} -} for i in 1 2; do : > $i.expected done @@ -7064,9 +7021,7 @@ ovn_attach n1 br-phys 192.168.0.1 row=`ovn-nbctl create Address_Set name=set1 addresses=\"1.1.1.1\"` ovn-nbctl set Address_Set $row name=set1 addresses=\"1.1.1.1,1.1.1.2\" -ovn-nbctl destroy Address_Set $row - -sleep 1 +ovn-nbctl --wait=hv destroy Address_Set $row # A bug previously existed in the address set support code # that caused ovn-controller to crash after an address set @@ -7454,8 +7409,7 @@ ovs-vsctl -- add-port br-int hv1-vif3 -- \ ofport-request=3 # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Send ip packets between foo1 and foo2 src_mac="0a0000a80103" @@ -7666,8 +7620,7 @@ ovs-vsctl -- add-port br-int hv1-ls2lp2 -- \ ofport-request=2 # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync echo "---------NB dump-----" ovn-nbctl show @@ -8796,8 +8749,7 @@ ovs-vsctl -- add-port br-int vm2 -- \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Test that ovn-controllers create ct-zone entry for container ports. foo1_zoneid=$(as hv1 ovs-vsctl get bridge br-int external_ids:ct-zone-foo1) @@ -9105,8 +9057,7 @@ ovn-nbctl --wait=hv lsp-add bob bob1 \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync trim_zeros() { sed 's/\(00\)\{1,\}$//' @@ -9215,7 +9166,7 @@ ovs-vsctl -- add-port br-int hv1-vif2 -- \ ofport-request=2 OVN_POPULATE_ARP -sleep 2 +check ovn-nbctl --wait=hv sync as hv1 ovs-vsctl show echo "*************************" @@ -9777,13 +9728,9 @@ test_ip_packet() fi as ext1 reset_pcap_file ext1-vif1 ext1/vif1 - sleep 1 - # Resend packet from foo1 to outside1 check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet - sleep 1 - AT_CAPTURE_FILE([exp]) AT_CAPTURE_FILE([rcv]) check_packets() { @@ -9987,8 +9934,7 @@ hv1_ch_uuid=$(fetch_column Chassis _uuid name=hv1) wait_column "$hv1_ch_uuid" HA_Chassis_Group ref_chassis # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 2 +check ovn-nbctl --wait=hv sync reset_pcap_file() { local iface=$1 @@ -10380,8 +10326,7 @@ ovn-nbctl lsp-add foo foo2 \ -- lsp-set-addresses foo2 "f0:00:00:01:02:06 192.168.1.3" # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync : > hv1-vif2.expected @@ -10472,10 +10417,6 @@ AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) AT_CHECK([ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1]) -# Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 2 - # Expect no packets when hv2 bridge-mapping is not present : > packets OVN_CHECK_PACKETS([hv1/snoopvif-tx.pcap], [packets]) @@ -10697,16 +10638,15 @@ ovn-nbctl lsp-set-addresses ln-outside unknown ovn-nbctl lsp-set-type ln-outside localnet ovn-nbctl lsp-set-options ln-outside network_name=phys -# Allow some time for ovn-northd and ovn-controller to catch up. -check ovn-nbctl --wait=hv sync - # Check that there is a logical flow in logical switch foo's pipeline # to set the outport to rp-foo (which is expected). OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \ grep rp-foo | grep -v is_chassis_resident | grep priority=50 -c`]) # Set the option 'reside-on-redirect-chassis' for foo -check ovn-nbctl --wait=hv set logical_router_port foo options:reside-on-redirect-chassis=true +check ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true +check ovn-nbctl --wait=hv sync + # Check that there is a logical flow in logical switch foo's pipeline # to set the outport to rp-foo with the condition is_chassis_redirect. ovn-sbctl dump-flows foo @@ -10935,8 +10875,7 @@ ovs-vsctl -- add-port br-int hv1-vif3 -- \ ofport-request=3 # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync reset_pcap_file() { local iface=$1 @@ -11191,8 +11130,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Send ip packets between foo1 and alice1 src_mac="f00000010203" @@ -12776,8 +12714,6 @@ for i in 1 2 3; do done OVN_POPULATE_ARP -# allow some time for ovn-northd and ovn-controller to catch up. -sleep 1 for i in 1 2 3; do : > vif${i}1.expected @@ -12936,8 +12872,7 @@ done OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT... # @@ -13016,10 +12951,6 @@ for is in 1 2 3; do done done -# Allow some time for packet forwarding. -# XXX This can be improved. -sleep 1 - # Now check the packets actually received against the ones expected. for i in 1 2 3; do for j in 1 2 3; do @@ -13160,8 +13091,7 @@ done OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync lsp_to_mac() { echo f0:00:00:00:0${1:0:1}:${1:1:2} @@ -13267,10 +13197,6 @@ for is in 1 2 3; do done done -# Allow some time for packet forwarding. -# XXX This can be improved. -sleep 1 - # Now check the packets actually received against the ones expected. for i in 1 2 3; do for j in 1 2 3; do @@ -13741,8 +13667,6 @@ reset_pcap_file hv1-vif2 hv1/vif2 rm -f 2.packets > 2.expected -#sleep infinity - # Remove the first less restrictive allow ACL. ovn-nbctl acl-del ls1 to-lport 3 'ip4.src==10.0.0.1 || ip4.src==10.0.0.1' ovn-nbctl --wait=hv sync @@ -13865,8 +13789,7 @@ ovn-nbctl create Address_Set name=set1 addresses=\"f0:00:00:00:00:11\",\"f0:00:0 OVN_POPULATE_ARP # Allow some time for ovn-northd and ovn-controller to catch up. -# XXX This should be more systematic. -sleep 1 +check ovn-nbctl --wait=hv sync # Make sure there is no attempt to adding duplicated flows by ovn-controller AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"]) @@ -14327,7 +14250,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP -sleep 1 +check ovn-nbctl --wait=hv sync packet="inport==\"sw1-p1\" && eth.src==$sw1_p1_mac && eth.dst==$sw1_ro_mac && ip4 && ip.ttl==64 && ip4.src==$sw1_p1_ip && ip4.dst==$sw2_p1_ip && @@ -15250,7 +15173,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \ OVN_POPULATE_ARP -sleep 1 +check ovn-nbctl --wait=hv sync packet="inport==\"sw1-p1\" && eth.src==$sw1_p1_mac && eth.dst==$sw1_ro_mac && ip4 && ip.ttl==64 && ip4.src==$sw1_p1_ip && ip4.dst==$sw2_p1_ip &&
Many of these could be replaced by "ovn-nbctl sync". Some weren't really needed at all because they were adjacent to something that itself called sync or otherwise used --wait. Some were more appropriately done with explicit waits for what was really needed. I left some "sleep"s. Some were because they were "negative" sleeps: they were giving time for something to happen that should *not* happen (in other words, if you wait for it to happen, you'll wait forever, unless there's a bug). Some were because I didn't know how to properly wait for what they were waiting for, or because I didn't understand the circumstances deeply enough. Signed-off-by: Ben Pfaff <blp@ovn.org> --- tests/ovn-northd.at | 7 ++ tests/ovn.at | 167 ++++++++++++-------------------------------- 2 files changed, 52 insertions(+), 122 deletions(-)