diff mbox series

[ovs-dev,v3,4/6] tests: Eliminate most "sleep" calls.

Message ID 20201106033641.1848523-5-blp@ovn.org
State Superseded
Headers show
Series Add DDlog implementation of ovn-northd | expand

Commit Message

Ben Pfaff Nov. 6, 2020, 3:36 a.m. UTC
Many of these could be replaced by "ovn-nbctl sync".  Some weren't
really needed at all because they were adjacent to something that itself
called sync or otherwise used --wait.  Some were more appropriately
done with explicit waits for what was really needed.

I left some "sleep"s.  Some were because they were "negative" sleeps:
they were giving time for something to happen that should *not* happen
(in other words, if you wait for it to happen, you'll wait forever,
unless there's a bug).  Some were because I didn't know how to properly
wait for what they were waiting for, or because I didn't understand
the circumstances deeply enough.

Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 tests/ovn-northd.at |   7 ++
 tests/ovn.at        | 167 ++++++++++++--------------------------------
 2 files changed, 52 insertions(+), 122 deletions(-)

Comments

Dumitru Ceara Nov. 9, 2020, 1:44 p.m. UTC | #1
On 11/6/20 4:36 AM, Ben Pfaff wrote:
> Many of these could be replaced by "ovn-nbctl sync".  Some weren't
> really needed at all because they were adjacent to something that itself
> called sync or otherwise used --wait.  Some were more appropriately
> done with explicit waits for what was really needed.
> 
> I left some "sleep"s.  Some were because they were "negative" sleeps:
> they were giving time for something to happen that should *not* happen
> (in other words, if you wait for it to happen, you'll wait forever,
> unless there's a bug).  Some were because I didn't know how to properly
> wait for what they were waiting for, or because I didn't understand
> the circumstances deeply enough.
> 
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
>  tests/ovn-northd.at |   7 ++
>  tests/ovn.at        | 167 ++++++++++++--------------------------------
>  2 files changed, 52 insertions(+), 122 deletions(-)
> 
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 949a8eee054e..5d670233561e 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \
>  Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \
>  health_check @hc
>  wait_row_count Service_Monitor 2
> +check ovn-nbctl --wait=hv sync

This should be "--wait=sb".  ovn-controller is not started for tests in
ovn-northd.at and "--wait=hv" would return immediately.  This applies to
all "ovn-nbctl --wait=hv sync" in ovn-northd.at.

Thanks,
Dumitru
Ben Pfaff Nov. 10, 2020, 12:26 a.m. UTC | #2
On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote:
> On 11/6/20 4:36 AM, Ben Pfaff wrote:
> > Many of these could be replaced by "ovn-nbctl sync".  Some weren't
> > really needed at all because they were adjacent to something that itself
> > called sync or otherwise used --wait.  Some were more appropriately
> > done with explicit waits for what was really needed.
> > 
> > I left some "sleep"s.  Some were because they were "negative" sleeps:
> > they were giving time for something to happen that should *not* happen
> > (in other words, if you wait for it to happen, you'll wait forever,
> > unless there's a bug).  Some were because I didn't know how to properly
> > wait for what they were waiting for, or because I didn't understand
> > the circumstances deeply enough.
> > 
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> >  tests/ovn-northd.at |   7 ++
> >  tests/ovn.at        | 167 ++++++++++++--------------------------------
> >  2 files changed, 52 insertions(+), 122 deletions(-)
> > 
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 949a8eee054e..5d670233561e 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \
> >  Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \
> >  health_check @hc
> >  wait_row_count Service_Monitor 2
> > +check ovn-nbctl --wait=hv sync
> 
> This should be "--wait=sb".  ovn-controller is not started for tests in
> ovn-northd.at and "--wait=hv" would return immediately.  This applies to
> all "ovn-nbctl --wait=hv sync" in ovn-northd.at.

I see what you mean.

This is surprising to me.  When I defined this stuff years ago, my
intent was that --wait=hv would be stronger than --wait=sb.  That is,
--wait=sb would cause ovn-nbctl to wait for the southbound database to
catch up with the northbound changes, and --wait=hb would cause it to
wait for the southbound database AND the hypervisors to catch up.

That's even how it's documented for ovn-nbctl:

        <p>
          With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it
          waits for <code>ovn-northd</code> to bring the southbound database
          up-to-date with the northbound database updates.
        </p>

        <p>
          With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it
          additionally waits for all OVN chassis (hypervisors and gateways) to
          become up-to-date with the northbound database updates.  (This can
          become an indefinite wait if any chassis is malfunctioning.)
        </p>

That's not what actually happens, though.  As you point out, if there
are no hypervisors, then the hypervisors are always up-to-date, even if
the database is not.

I think this is a bug in ovn-nbctl.  Arguably, it's a bug in the
database definition (perhaps hv_cfg shouldn't even be filled in but left
empty if there are no chassis) but I think it is too late to fix it
there.  It is, however, easy enough to fix it in ovn-nbctl.

(And at the same time, I'll change these in ovn-northd to just say
--wait=sb.  You have a point.)

How about this as an additional patch?

-8<--------------------------cut here-------------------------->8--

From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001
From: Ben Pfaff <blp@ovn.org>
Date: Mon, 9 Nov 2020 16:12:50 -0800
Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in
 corner case.

In the corner case where there are no chassis, --wait=hv didn't wait at
all, instead of waiting for the southbound database.

Reported-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 ovn-nb.xml            | 27 ++++++++++++++++++++-------
 utilities/ovn-nbctl.c |  2 +-
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/ovn-nb.xml b/ovn-nb.xml
index 51b186b84419..d3e51f3e5e87 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -84,13 +84,26 @@
       </column>
 
       <column name="hv_cfg">
-        Sequence number that <code>ovn-northd</code> sets to the smallest
-        sequence number of all the chassis in the system, as reported in the
-        <code>Chassis_Private</code> table in the southbound database.  Thus,
-        <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
-        caught up with the northbound configuration (which may never happen, if
-        any chassis is down).  This value can regress, if a chassis was removed
-        from the system and rejoins before catching up.
+        <p>
+          Sequence number that <code>ovn-northd</code> sets to the smallest
+          sequence number of all the chassis in the system, as reported in the
+          <code>Chassis_Private</code> table in the southbound database.  Thus,
+          <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
+          caught up with the northbound configuration (which may never happen, if
+          any chassis is down).  This value can regress, if a chassis was removed
+          from the system and rejoins before catching up.
+        </p>
+
+        <p>
+          If there are no chassis, then <code>ovn-northd</code> copies
+          <code>nb_cfg</code> to <ref column="hv_cfg"/>.  Thus, in this case,
+          the (nonexistent) hypervisors are always considered to be caught up.
+          This means that hypervisors can be "caught up" even in cases where
+          <ref column="sb_cfg"/> would show that the southbound database is
+          not.  To detect when both the hypervisors and the southbound database
+          are caught up, a client should take the smaller of <ref
+          column="sb_cfg"/> and <ref column="hv_cfg"/>.
+        </p>
       </column>
 
       <column name="hv_cfg_timestamp">
diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
index 4f28edc808ec..ee9849de53a9 100644
--- a/utilities/ovn-nbctl.c
+++ b/utilities/ovn-nbctl.c
@@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands,
             NBREC_NB_GLOBAL_FOR_EACH (nb, idl) {
                 int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB
                                    ? nb->sb_cfg
-                                   : nb->hv_cfg);
+                                   : MIN(nb->sb_cfg, nb->hv_cfg));
                 if (cur_cfg >= next_cfg) {
                     if (print_wait_time) {
                         printf("Time spent on processing nb_cfg %"PRId64":\n",
Dumitru Ceara Nov. 10, 2020, 8:47 a.m. UTC | #3
On 11/10/20 1:26 AM, Ben Pfaff wrote:
> On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote:
>> On 11/6/20 4:36 AM, Ben Pfaff wrote:
>>> Many of these could be replaced by "ovn-nbctl sync".  Some weren't
>>> really needed at all because they were adjacent to something that itself
>>> called sync or otherwise used --wait.  Some were more appropriately
>>> done with explicit waits for what was really needed.
>>>
>>> I left some "sleep"s.  Some were because they were "negative" sleeps:
>>> they were giving time for something to happen that should *not* happen
>>> (in other words, if you wait for it to happen, you'll wait forever,
>>> unless there's a bug).  Some were because I didn't know how to properly
>>> wait for what they were waiting for, or because I didn't understand
>>> the circumstances deeply enough.
>>>
>>> Signed-off-by: Ben Pfaff <blp@ovn.org>
>>> ---
>>>  tests/ovn-northd.at |   7 ++
>>>  tests/ovn.at        | 167 ++++++++++++--------------------------------
>>>  2 files changed, 52 insertions(+), 122 deletions(-)
>>>
>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>> index 949a8eee054e..5d670233561e 100644
>>> --- a/tests/ovn-northd.at
>>> +++ b/tests/ovn-northd.at
>>> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \
>>>  Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \
>>>  health_check @hc
>>>  wait_row_count Service_Monitor 2
>>> +check ovn-nbctl --wait=hv sync
>>
>> This should be "--wait=sb".  ovn-controller is not started for tests in
>> ovn-northd.at and "--wait=hv" would return immediately.  This applies to
>> all "ovn-nbctl --wait=hv sync" in ovn-northd.at.
> 
> I see what you mean.
> 
> This is surprising to me.  When I defined this stuff years ago, my
> intent was that --wait=hv would be stronger than --wait=sb.  That is,
> --wait=sb would cause ovn-nbctl to wait for the southbound database to
> catch up with the northbound changes, and --wait=hb would cause it to
> wait for the southbound database AND the hypervisors to catch up.
> 
> That's even how it's documented for ovn-nbctl:
> 
>         <p>
>           With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it
>           waits for <code>ovn-northd</code> to bring the southbound database
>           up-to-date with the northbound database updates.
>         </p>
> 
>         <p>
>           With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it
>           additionally waits for all OVN chassis (hypervisors and gateways) to
>           become up-to-date with the northbound database updates.  (This can
>           become an indefinite wait if any chassis is malfunctioning.)
>         </p>
> 
> That's not what actually happens, though.  As you point out, if there
> are no hypervisors, then the hypervisors are always up-to-date, even if
> the database is not.
> 
> I think this is a bug in ovn-nbctl.  Arguably, it's a bug in the
> database definition (perhaps hv_cfg shouldn't even be filled in but left
> empty if there are no chassis) but I think it is too late to fix it
> there.  It is, however, easy enough to fix it in ovn-nbctl.
> 
> (And at the same time, I'll change these in ovn-northd to just say
> --wait=sb.  You have a point.)
> 
> How about this as an additional patch?
> 
> -8<--------------------------cut here-------------------------->8--
> 
> From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001
> From: Ben Pfaff <blp@ovn.org>
> Date: Mon, 9 Nov 2020 16:12:50 -0800
> Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in
>  corner case.
> 
> In the corner case where there are no chassis, --wait=hv didn't wait at
> all, instead of waiting for the southbound database.
> 
> Reported-by: Dumitru Ceara <dceara@redhat.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
>  ovn-nb.xml            | 27 ++++++++++++++++++++-------
>  utilities/ovn-nbctl.c |  2 +-
>  2 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 51b186b84419..d3e51f3e5e87 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -84,13 +84,26 @@
>        </column>
>  
>        <column name="hv_cfg">
> -        Sequence number that <code>ovn-northd</code> sets to the smallest
> -        sequence number of all the chassis in the system, as reported in the
> -        <code>Chassis_Private</code> table in the southbound database.  Thus,
> -        <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
> -        caught up with the northbound configuration (which may never happen, if
> -        any chassis is down).  This value can regress, if a chassis was removed
> -        from the system and rejoins before catching up.
> +        <p>
> +          Sequence number that <code>ovn-northd</code> sets to the smallest
> +          sequence number of all the chassis in the system, as reported in the
> +          <code>Chassis_Private</code> table in the southbound database.  Thus,
> +          <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
> +          caught up with the northbound configuration (which may never happen, if
> +          any chassis is down).  This value can regress, if a chassis was removed
> +          from the system and rejoins before catching up.
> +        </p>
> +
> +        <p>
> +          If there are no chassis, then <code>ovn-northd</code> copies
> +          <code>nb_cfg</code> to <ref column="hv_cfg"/>.  Thus, in this case,
> +          the (nonexistent) hypervisors are always considered to be caught up.
> +          This means that hypervisors can be "caught up" even in cases where
> +          <ref column="sb_cfg"/> would show that the southbound database is
> +          not.  To detect when both the hypervisors and the southbound database
> +          are caught up, a client should take the smaller of <ref
> +          column="sb_cfg"/> and <ref column="hv_cfg"/>.
> +        </p>
>        </column>
>  
>        <column name="hv_cfg_timestamp">
> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> index 4f28edc808ec..ee9849de53a9 100644
> --- a/utilities/ovn-nbctl.c
> +++ b/utilities/ovn-nbctl.c
> @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands,
>              NBREC_NB_GLOBAL_FOR_EACH (nb, idl) {
>                  int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB
>                                     ? nb->sb_cfg
> -                                   : nb->hv_cfg);
> +                                   : MIN(nb->sb_cfg, nb->hv_cfg));
>                  if (cur_cfg >= next_cfg) {
>                      if (print_wait_time) {
>                          printf("Time spent on processing nb_cfg %"PRId64":\n",
> 

Looks good to me, thanks!
Ben Pfaff Nov. 10, 2020, 6:36 p.m. UTC | #4
On Tue, Nov 10, 2020 at 09:47:56AM +0100, Dumitru Ceara wrote:
> On 11/10/20 1:26 AM, Ben Pfaff wrote:
> > On Mon, Nov 09, 2020 at 02:44:27PM +0100, Dumitru Ceara wrote:
> >> On 11/6/20 4:36 AM, Ben Pfaff wrote:
> >>> Many of these could be replaced by "ovn-nbctl sync".  Some weren't
> >>> really needed at all because they were adjacent to something that itself
> >>> called sync or otherwise used --wait.  Some were more appropriately
> >>> done with explicit waits for what was really needed.
> >>>
> >>> I left some "sleep"s.  Some were because they were "negative" sleeps:
> >>> they were giving time for something to happen that should *not* happen
> >>> (in other words, if you wait for it to happen, you'll wait forever,
> >>> unless there's a bug).  Some were because I didn't know how to properly
> >>> wait for what they were waiting for, or because I didn't understand
> >>> the circumstances deeply enough.
> >>>
> >>> Signed-off-by: Ben Pfaff <blp@ovn.org>
> >>> ---
> >>>  tests/ovn-northd.at |   7 ++
> >>>  tests/ovn.at        | 167 ++++++++++++--------------------------------
> >>>  2 files changed, 52 insertions(+), 122 deletions(-)
> >>>
> >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >>> index 949a8eee054e..5d670233561e 100644
> >>> --- a/tests/ovn-northd.at
> >>> +++ b/tests/ovn-northd.at
> >>> @@ -1120,6 +1120,7 @@ ovn-nbctl --wait=sb -- --id=@hc create \
> >>>  Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \
> >>>  health_check @hc
> >>>  wait_row_count Service_Monitor 2
> >>> +check ovn-nbctl --wait=hv sync
> >>
> >> This should be "--wait=sb".  ovn-controller is not started for tests in
> >> ovn-northd.at and "--wait=hv" would return immediately.  This applies to
> >> all "ovn-nbctl --wait=hv sync" in ovn-northd.at.
> > 
> > I see what you mean.
> > 
> > This is surprising to me.  When I defined this stuff years ago, my
> > intent was that --wait=hv would be stronger than --wait=sb.  That is,
> > --wait=sb would cause ovn-nbctl to wait for the southbound database to
> > catch up with the northbound changes, and --wait=hb would cause it to
> > wait for the southbound database AND the hypervisors to catch up.
> > 
> > That's even how it's documented for ovn-nbctl:
> > 
> >         <p>
> >           With <code>--wait=sb</code>, before <code>ovn-nbctl</code> exits, it
> >           waits for <code>ovn-northd</code> to bring the southbound database
> >           up-to-date with the northbound database updates.
> >         </p>
> > 
> >         <p>
> >           With <code>--wait=hv</code>, before <code>ovn-nbctl</code> exits, it
> >           additionally waits for all OVN chassis (hypervisors and gateways) to
> >           become up-to-date with the northbound database updates.  (This can
> >           become an indefinite wait if any chassis is malfunctioning.)
> >         </p>
> > 
> > That's not what actually happens, though.  As you point out, if there
> > are no hypervisors, then the hypervisors are always up-to-date, even if
> > the database is not.
> > 
> > I think this is a bug in ovn-nbctl.  Arguably, it's a bug in the
> > database definition (perhaps hv_cfg shouldn't even be filled in but left
> > empty if there are no chassis) but I think it is too late to fix it
> > there.  It is, however, easy enough to fix it in ovn-nbctl.
> > 
> > (And at the same time, I'll change these in ovn-northd to just say
> > --wait=sb.  You have a point.)
> > 
> > How about this as an additional patch?
> > 
> > -8<--------------------------cut here-------------------------->8--
> > 
> > From 7b373c22dbda2f808f3d5f3b8ae6eb67612dbe87 Mon Sep 17 00:00:00 2001
> > From: Ben Pfaff <blp@ovn.org>
> > Date: Mon, 9 Nov 2020 16:12:50 -0800
> > Subject: [PATCH ovn] ovn-nbctl: Make --wait=hv wait for southbound database in
> >  corner case.
> > 
> > In the corner case where there are no chassis, --wait=hv didn't wait at
> > all, instead of waiting for the southbound database.
> > 
> > Reported-by: Dumitru Ceara <dceara@redhat.com>
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> >  ovn-nb.xml            | 27 ++++++++++++++++++++-------
> >  utilities/ovn-nbctl.c |  2 +-
> >  2 files changed, 21 insertions(+), 8 deletions(-)
> > 
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 51b186b84419..d3e51f3e5e87 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -84,13 +84,26 @@
> >        </column>
> >  
> >        <column name="hv_cfg">
> > -        Sequence number that <code>ovn-northd</code> sets to the smallest
> > -        sequence number of all the chassis in the system, as reported in the
> > -        <code>Chassis_Private</code> table in the southbound database.  Thus,
> > -        <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
> > -        caught up with the northbound configuration (which may never happen, if
> > -        any chassis is down).  This value can regress, if a chassis was removed
> > -        from the system and rejoins before catching up.
> > +        <p>
> > +          Sequence number that <code>ovn-northd</code> sets to the smallest
> > +          sequence number of all the chassis in the system, as reported in the
> > +          <code>Chassis_Private</code> table in the southbound database.  Thus,
> > +          <ref column="hv_cfg"/> equals <ref column="nb_cfg"/> if all chassis are
> > +          caught up with the northbound configuration (which may never happen, if
> > +          any chassis is down).  This value can regress, if a chassis was removed
> > +          from the system and rejoins before catching up.
> > +        </p>
> > +
> > +        <p>
> > +          If there are no chassis, then <code>ovn-northd</code> copies
> > +          <code>nb_cfg</code> to <ref column="hv_cfg"/>.  Thus, in this case,
> > +          the (nonexistent) hypervisors are always considered to be caught up.
> > +          This means that hypervisors can be "caught up" even in cases where
> > +          <ref column="sb_cfg"/> would show that the southbound database is
> > +          not.  To detect when both the hypervisors and the southbound database
> > +          are caught up, a client should take the smaller of <ref
> > +          column="sb_cfg"/> and <ref column="hv_cfg"/>.
> > +        </p>
> >        </column>
> >  
> >        <column name="hv_cfg_timestamp">
> > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> > index 4f28edc808ec..ee9849de53a9 100644
> > --- a/utilities/ovn-nbctl.c
> > +++ b/utilities/ovn-nbctl.c
> > @@ -6357,7 +6357,7 @@ do_nbctl(const char *args, struct ctl_command *commands, size_t n_commands,
> >              NBREC_NB_GLOBAL_FOR_EACH (nb, idl) {
> >                  int64_t cur_cfg = (wait_type == NBCTL_WAIT_SB
> >                                     ? nb->sb_cfg
> > -                                   : nb->hv_cfg);
> > +                                   : MIN(nb->sb_cfg, nb->hv_cfg));
> >                  if (cur_cfg >= next_cfg) {
> >                      if (print_wait_time) {
> >                          printf("Time spent on processing nb_cfg %"PRId64":\n",
> > 
> 
> Looks good to me, thanks!

Thanks, I applied this bug fix to OVN master.
diff mbox series

Patch

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 949a8eee054e..5d670233561e 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1120,6 +1120,7 @@  ovn-nbctl --wait=sb -- --id=@hc create \
 Load_Balancer_Health_Check vip="10.0.0.10\:80" -- add Load_Balancer . \
 health_check @hc
 wait_row_count Service_Monitor 2
+check ovn-nbctl --wait=hv sync
 
 ovn-sbctl dump-flows sw0 | grep ct_lb | grep priority=120 > lflows.txt
 AT_CHECK([cat lflows.txt | sed 's/table=..//'], [0], [dnl
@@ -1139,6 +1140,7 @@  OVS_WAIT_FOR_OUTPUT(
 # Set the service monitor for sw1-p1 to offline
 check ovn-sbctl set service_monitor sw1-p1 status=offline
 wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=offline
+check ovn-nbctl --wait=hv sync
 
 AT_CAPTURE_FILE([sbflows4])
 OVS_WAIT_FOR_OUTPUT(
@@ -1150,6 +1152,7 @@  OVS_WAIT_FOR_OUTPUT(
 ovn-sbctl set service_monitor $sm_sw0_p1 status=offline
 
 wait_row_count Service_Monitor 1 logical_port=sw0-p1 status=offline
+check ovn-nbctl --wait=hv sync
 
 AT_CAPTURE_FILE([sbflows5])
 OVS_WAIT_FOR_OUTPUT(
@@ -1166,6 +1169,7 @@  ovn-sbctl set service_monitor $sm_sw0_p1 status=online
 ovn-sbctl set service_monitor $sm_sw1_p1 status=online
 
 wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=online
+check ovn-nbctl --wait=hv sync
 
 AT_CAPTURE_FILE([sbflows7])
 OVS_WAIT_FOR_OUTPUT(
@@ -1176,6 +1180,7 @@  OVS_WAIT_FOR_OUTPUT(
 # Set the service monitor for sw1-p1 to error
 ovn-sbctl set service_monitor $sm_sw1_p1 status=error
 wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=error
+check ovn-nbctl --wait=hv sync
 
 ovn-sbctl dump-flows sw0 | grep "ip4.dst == 10.0.0.10 && tcp.dst == 80" \
 | grep priority=120 > lflows.txt
@@ -1214,6 +1219,7 @@  OVS_WAIT_FOR_OUTPUT(
 check ovn-sbctl set service_monitor sw1-p1 status=online
 
 wait_row_count Service_Monitor 1 logical_port=sw1-p1 status=online
+check ovn-nbctl --wait=hv sync
 
 AT_CAPTURE_FILE([sbflows10])
 OVS_WAIT_FOR_OUTPUT(
@@ -1244,6 +1250,7 @@  AT_CHECK([ovn-nbctl -- --id=@hc create Load_Balancer_Health_Check vip="10.0.0.10
 
 check ovn-nbctl ls-lb-add sw0 lb2
 check ovn-nbctl ls-lb-add sw1 lb2
+check ovn-nbctl --wait=hv sync
 
 wait_row_count Service_Monitor 5
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 243e17ad8b16..bb4bb22fef5c 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -1790,10 +1790,6 @@  check ovn-nbctl --wait=hv sync
 # for ARP resolution).
 OVN_POPULATE_ARP
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
-
 # Make sure there is no attempt to adding duplicated flows by ovn-controller
 AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"])
 AT_FAIL_IF([test -n "`grep duplicate hv2/ovn-controller.log`"])
@@ -1988,11 +1984,7 @@  done
 
 # set address for lp13 with invalid characters.
 # lp13 should be configured with only 192.168.0.13.
-ovn-nbctl lsp-set-addresses lp13 "f0:00:00:00:00:13 192.168.0.13 invalid 192.169.0.13"
-
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+ovn-nbctl --wait=hv lsp-set-addresses lp13 "f0:00:00:00:00:13 192.168.0.13 invalid 192.169.0.13"
 
 sip=`ip_to_hex 192 168 0 11`
 tip=`ip_to_hex 192 168 0 13`
@@ -2067,7 +2059,11 @@  for i in 1 2; do
     done
 done
 
-sleep 1
+# Wait for bindings to take effect.
+wait_row_count Port_Binding 1 logical_port=lp11 'encap!=[[]]'
+wait_row_count Port_Binding 1 logical_port=lp12 'encap!=[[]]'
+wait_row_count Port_Binding 1 logical_port=lp21 'encap!=[[]]'
+wait_row_count Port_Binding 1 logical_port=lp22 'encap!=[[]]'
 
 # dump port bindings; since we have vxlan and geneve tunnels, we expect the
 # ports to be bound to geneve tunnels.
@@ -2087,9 +2083,7 @@  check_row_count Port_Binding 1 logical_port=lp22 encap=$encap_rec
 # for ARP resolution).
 OVN_POPULATE_ARP
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Make sure there is no attempt to adding duplicated flows by ovn-controller
 AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"])
@@ -3205,9 +3199,7 @@  ovs-vsctl add-port br-phys vif3 -- set Interface vif3 options:tx_pcap=hv3/vif3-t
 # for ARP resolution).
 OVN_POPULATE_ARP
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # test_packet INPORT DST SRC ETHTYPE OUTPORT...
 #
@@ -3373,9 +3365,7 @@  ovs-vsctl add-port br-phys vif3 -- set Interface vif3 options:tx_pcap=hv3/vif3-t
 # for ARP resolution).
 OVN_POPULATE_ARP
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # test_packet INPORT DST SRC ETHTYPE OUTPORT...
 #
@@ -3573,8 +3563,7 @@  check ovn-nbctl --wait=hv sync
 # for ARP resolution).
 OVN_POPULATE_ARP
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
+check ovn-nbctl --wait=hv sync
 
 # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
 #
@@ -3974,8 +3963,7 @@  done
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
 #
@@ -4148,8 +4136,7 @@  done
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Given the name of a logical port, prints the name of the hypervisor
 # on which it is located.
@@ -4583,8 +4570,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Packet to send.
 packet="inport==\"ls1-lp1\" && eth.src==$ls1_lp1_mac && eth.dst==$rp_ls1_mac &&
@@ -4695,9 +4681,7 @@  ovs-vsctl -- add-port br-int vif2 -- \
     ofport-request=1
 
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between the two ports.
 
@@ -4729,11 +4713,7 @@  as hv1 ovs-ofctl dump-flows br-int
 
 
 #Disable router R1
-ovn-nbctl set Logical_Router R1 enabled=false
-
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+ovn-nbctl --wait=hv set Logical_Router R1 enabled=false
 
 echo "---------SB dump-----"
 ovn-sbctl list datapath_binding
@@ -4809,10 +4789,8 @@  ovs-vsctl -- add-port br-int vif2 -- \
     options:rxq_pcap=hv1/vif2-rx.pcap \
     ofport-request=1
 
+check ovn-nbctl --wait=hv sync
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
 
 # Send ip packets between the two ports.
 
@@ -4843,7 +4821,7 @@  echo "------ hv1 dump ----------"
 as hv1 ovs-ofctl dump-flows br-int
 
 #Disable router R1
-ovn-nbctl set Logical_Router R1 enabled=false
+ovn-nbctl --wait=hv set Logical_Router R1 enabled=false
 
 echo "---------SB dump-----"
 ovn-sbctl list datapath_binding
@@ -4854,9 +4832,6 @@  echo "---------------------"
 echo "------ hv1 dump ----------"
 as hv1 ovs-ofctl dump-flows br-int
 
-# Allow some time for the disabling of logical router R1 to propagate.
-# XXX This should be more systematic.
-sleep 1
 
 as hv1 ovs-appctl netdev-dummy/receive vif1 $packet
 
@@ -4961,8 +4936,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between foo1 and alice1
 src_mac="f00000010203"
@@ -5184,8 +5158,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between foo1 and alice1
 src_mac="f00000010203"
@@ -5322,7 +5295,7 @@  as hv1 ovs-appctl vlog/set dbg
 
 OVN_POPULATE_ARP
 
-sleep 2
+check ovn-nbctl --wait=hv sync
 
 as hv1 ovs-vsctl show
 
@@ -5987,7 +5960,7 @@  ovs-vsctl -- add-port br-int hv1-vif5 -- \
 
 OVN_POPULATE_ARP
 
-sleep 2
+check ovn-nbctl --wait=hv sync
 
 trim_zeros() {
     sed 's/\(00\)\{1,\}$//'
@@ -6269,10 +6242,7 @@  ovn-nbctl lsp-add foo foo1 \
 ovn-nbctl lsp-add alice alice1 \
 -- lsp-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2"
 
-
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 2
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between foo1 and alice1
 src_mac="f00000010203"
@@ -6335,7 +6305,7 @@  ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \
 R2 static_routes @lrt
 
 # Wait for ovn-controller to catch up.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send the packet again.
 as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
@@ -6407,10 +6377,8 @@  ovs-vsctl -- add-port br-int vif2 -- \
     options:rxq_pcap=hv1/vif2-rx.pcap \
     ofport-request=1
 
-
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 
 for i in 1 2; do
@@ -6568,10 +6536,6 @@  ovs-vsctl -- add-port br-int vif3 -- \
     options:rxq_pcap=pbr-hv/vif3-rx.pcap \
     ofport-request=1
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
-
 ls1_ro_mac=00:00:00:01:02:f1
 ls1_ro_ip=192.168.1.1
 
@@ -6758,10 +6722,6 @@  ovs-vsctl -- add-port br-int vif3 -- \
     options:rxq_pcap=pbr-hv/vif3-rx.pcap \
     ofport-request=1
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
-
 ls1_ro_mac=00:00:00:01:02:f1
 ls1_ro_ip=2001::1
 
@@ -7016,14 +6976,11 @@  ovn-nbctl acl-add lsw0 to-lport 1002 'outport == "lp1" && ip6 && icmp6'  allow-r
 ovn-nbctl acl-add lsw0 to-lport 1002 'outport == "lp2" && ip6 && icmp6'  allow-related
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
+# XXX The "sleep" here seems to be essential for ovn-northd-ddlog,
+# which may indicate that it needs improvement.
+check ovn-nbctl --wait=hv sync
 sleep 1
 
-# Given the name of a logical port, prints the name of the hypervisor
-# on which it is located.
-vif_to_hv() {
-    echo hv1${1%?}
-}
 for i in 1 2; do
     : > $i.expected
 done
@@ -7064,9 +7021,7 @@  ovn_attach n1 br-phys 192.168.0.1
 
 row=`ovn-nbctl create Address_Set name=set1 addresses=\"1.1.1.1\"`
 ovn-nbctl set Address_Set $row name=set1 addresses=\"1.1.1.1,1.1.1.2\"
-ovn-nbctl destroy Address_Set $row
-
-sleep 1
+ovn-nbctl --wait=hv destroy Address_Set $row
 
 # A bug previously existed in the address set support code
 # that caused ovn-controller to crash after an address set
@@ -7454,8 +7409,7 @@  ovs-vsctl -- add-port br-int hv1-vif3 -- \
     ofport-request=3
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between foo1 and foo2
 src_mac="0a0000a80103"
@@ -7666,8 +7620,7 @@  ovs-vsctl -- add-port br-int hv1-ls2lp2 -- \
     ofport-request=2
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 echo "---------NB dump-----"
 ovn-nbctl show
@@ -8796,8 +8749,7 @@  ovs-vsctl -- add-port br-int vm2 -- \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Test that ovn-controllers create ct-zone entry for container ports.
 foo1_zoneid=$(as hv1 ovs-vsctl get bridge br-int external_ids:ct-zone-foo1)
@@ -9105,8 +9057,7 @@  ovn-nbctl --wait=hv lsp-add bob bob1 \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 trim_zeros() {
     sed 's/\(00\)\{1,\}$//'
@@ -9215,7 +9166,7 @@  ovs-vsctl -- add-port br-int hv1-vif2 -- \
     ofport-request=2
 
 OVN_POPULATE_ARP
-sleep 2
+check ovn-nbctl --wait=hv sync
 as hv1 ovs-vsctl show
 
 echo "*************************"
@@ -9777,13 +9728,9 @@  test_ip_packet()
     fi
     as ext1 reset_pcap_file ext1-vif1 ext1/vif1
 
-    sleep 1
-
     # Resend packet from foo1 to outside1
     check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
 
-    sleep 1
-
     AT_CAPTURE_FILE([exp])
     AT_CAPTURE_FILE([rcv])
     check_packets() {
@@ -9987,8 +9934,7 @@  hv1_ch_uuid=$(fetch_column Chassis _uuid name=hv1)
 wait_column "$hv1_ch_uuid" HA_Chassis_Group ref_chassis
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 2
+check ovn-nbctl --wait=hv sync
 
 reset_pcap_file() {
     local iface=$1
@@ -10380,8 +10326,7 @@  ovn-nbctl lsp-add foo foo2 \
 -- lsp-set-addresses foo2 "f0:00:00:01:02:06 192.168.1.3"
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 : > hv1-vif2.expected
 
@@ -10472,10 +10417,6 @@  AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown])
 AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet])
 AT_CHECK([ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1])
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 2
-
 # Expect no packets when hv2 bridge-mapping is not present
 : > packets
 OVN_CHECK_PACKETS([hv1/snoopvif-tx.pcap], [packets])
@@ -10697,16 +10638,15 @@  ovn-nbctl lsp-set-addresses ln-outside unknown
 ovn-nbctl lsp-set-type ln-outside localnet
 ovn-nbctl lsp-set-options ln-outside network_name=phys
 
-# Allow some time for ovn-northd and ovn-controller to catch up.
-check ovn-nbctl --wait=hv sync
-
 # Check that there is a logical flow in logical switch foo's pipeline
 # to set the outport to rp-foo (which is expected).
 OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
 grep rp-foo | grep -v is_chassis_resident | grep priority=50 -c`])
 
 # Set the option 'reside-on-redirect-chassis' for foo
-check ovn-nbctl --wait=hv set logical_router_port foo options:reside-on-redirect-chassis=true
+check ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true
+check ovn-nbctl --wait=hv sync
+
 # Check that there is a logical flow in logical switch foo's pipeline
 # to set the outport to rp-foo with the condition is_chassis_redirect.
 ovn-sbctl dump-flows foo
@@ -10935,8 +10875,7 @@  ovs-vsctl -- add-port br-int hv1-vif3 -- \
     ofport-request=3
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 reset_pcap_file() {
     local iface=$1
@@ -11191,8 +11130,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Send ip packets between foo1 and alice1
 src_mac="f00000010203"
@@ -12776,8 +12714,6 @@  for i in 1 2 3; do
 done
 
 OVN_POPULATE_ARP
-# allow some time for ovn-northd and ovn-controller to catch up.
-sleep 1
 
 for i in 1 2 3; do
     : > vif${i}1.expected
@@ -12936,8 +12872,7 @@  done
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
 #
@@ -13016,10 +12951,6 @@  for is in 1 2 3; do
   done
 done
 
-# Allow some time for packet forwarding.
-# XXX This can be improved.
-sleep 1
-
 # Now check the packets actually received against the ones expected.
 for i in 1 2 3; do
     for j in 1 2 3; do
@@ -13160,8 +13091,7 @@  done
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 lsp_to_mac() {
     echo f0:00:00:00:0${1:0:1}:${1:1:2}
@@ -13267,10 +13197,6 @@  for is in 1 2 3; do
   done
 done
 
-# Allow some time for packet forwarding.
-# XXX This can be improved.
-sleep 1
-
 # Now check the packets actually received against the ones expected.
 for i in 1 2 3; do
     for j in 1 2 3; do
@@ -13741,8 +13667,6 @@  reset_pcap_file hv1-vif2 hv1/vif2
 rm -f 2.packets
 > 2.expected
 
-#sleep infinity
-
 # Remove the first less restrictive allow ACL.
 ovn-nbctl acl-del ls1 to-lport 3 'ip4.src==10.0.0.1 || ip4.src==10.0.0.1'
 ovn-nbctl --wait=hv sync
@@ -13865,8 +13789,7 @@  ovn-nbctl create Address_Set name=set1 addresses=\"f0:00:00:00:00:11\",\"f0:00:0
 OVN_POPULATE_ARP
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
-# XXX This should be more systematic.
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 # Make sure there is no attempt to adding duplicated flows by ovn-controller
 AT_FAIL_IF([test -n "`grep duplicate hv1/ovn-controller.log`"])
@@ -14327,7 +14250,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 
 OVN_POPULATE_ARP
 
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 packet="inport==\"sw1-p1\" && eth.src==$sw1_p1_mac && eth.dst==$sw1_ro_mac &&
        ip4 && ip.ttl==64 && ip4.src==$sw1_p1_ip && ip4.dst==$sw2_p1_ip &&
@@ -15250,7 +15173,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
 
 OVN_POPULATE_ARP
 
-sleep 1
+check ovn-nbctl --wait=hv sync
 
 packet="inport==\"sw1-p1\" && eth.src==$sw1_p1_mac && eth.dst==$sw1_ro_mac &&
        ip4 && ip.ttl==64 && ip4.src==$sw1_p1_ip && ip4.dst==$sw2_p1_ip &&