diff mbox series

[ovs-dev] patch.c: Avoid patch interface deletion & recreation during restart.

Message ID 20220628004934.282319-1-hzhou@ovn.org
State Changes Requested
Headers show
Series [ovs-dev] patch.c: Avoid patch interface deletion & recreation during restart. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Han Zhou June 28, 2022, 12:49 a.m. UTC
When ovn-controller is restarted, it may need multiple iterations of
main loop before completely download all related data from SB DB,
especially when ovn-monitor-all=false, so after restart, before it sees
the related localnet ports from SB DB, it treats the related patch
ports on the chassis as not needed, and deleted them. Later when it
downloads thoses port-bindings it recreates them.  For a graceful
upgrade, we don't this to happen, because it would break the traffic.

This is especially problematic at ovn-k8s setups because the external
side of the patch port is used to program some flows for external
network communication. When the patch ports are recreated, the OF port
number changes and those flows are broken. The CMS would detect and fix
the flows but it would result in even longer downtime.

This patch postpone the deletion of the patch ports, with the assumption
that leaving the unused ports on the chassis for little longer is not
harmful.

Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 controller/patch.c      | 15 ++++++++-
 tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

Comments

Dumitru Ceara June 28, 2022, 3:20 p.m. UTC | #1
On 6/28/22 02:49, Han Zhou wrote:
> When ovn-controller is restarted, it may need multiple iterations of
> main loop before completely download all related data from SB DB,
> especially when ovn-monitor-all=false, so after restart, before it sees
> the related localnet ports from SB DB, it treats the related patch
> ports on the chassis as not needed, and deleted them. Later when it
> downloads thoses port-bindings it recreates them.  For a graceful
> upgrade, we don't this to happen, because it would break the traffic.
> 
> This is especially problematic at ovn-k8s setups because the external
> side of the patch port is used to program some flows for external
> network communication. When the patch ports are recreated, the OF port
> number changes and those flows are broken. The CMS would detect and fix
> the flows but it would result in even longer downtime.
> 
> This patch postpone the deletion of the patch ports, with the assumption
> that leaving the unused ports on the chassis for little longer is not
> harmful.
> 
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  controller/patch.c      | 15 ++++++++-
>  tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/controller/patch.c b/controller/patch.c
> index ed831302c..9faae5879 100644
> --- a/controller/patch.c
> +++ b/controller/patch.c
> @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>  
>      /* Now 'existing_ports' only still contains patch ports that exist in the
>       * database but shouldn't.  Delete them from the database. */
> +
> +    /* Wait for some iterations before really deleting any patch ports, because
> +     * with conditional monitoring it is possible that related SB data is not
> +     * completely downloaded yet after last restart of ovn-controller.
> +     * Otherwise it may cause unncessary dataplane interruption during
> +     * restart/upgrade. */
> +    static int delete_patch_port_delay = 10;

Hi Han,

It's possible that ovn-controller wakes up 10 times in a row immediately
due to local OVSDB changes and doesn't process any new SB updates in
that time so 10 might not be enough in some cases.

Does it make sense to wait a given amount of time instead?  E.g., a few
seconds?  Can we make this configurable somehow?  Maybe an
ovn-controller command line argument to override the default?

Thanks,
Dumitru
Han Zhou June 28, 2022, 5:18 p.m. UTC | #2
On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 6/28/22 02:49, Han Zhou wrote:
> > When ovn-controller is restarted, it may need multiple iterations of
> > main loop before completely download all related data from SB DB,
> > especially when ovn-monitor-all=false, so after restart, before it sees
> > the related localnet ports from SB DB, it treats the related patch
> > ports on the chassis as not needed, and deleted them. Later when it
> > downloads thoses port-bindings it recreates them.  For a graceful
> > upgrade, we don't this to happen, because it would break the traffic.
> >
> > This is especially problematic at ovn-k8s setups because the external
> > side of the patch port is used to program some flows for external
> > network communication. When the patch ports are recreated, the OF port
> > number changes and those flows are broken. The CMS would detect and fix
> > the flows but it would result in even longer downtime.
> >
> > This patch postpone the deletion of the patch ports, with the assumption
> > that leaving the unused ports on the chassis for little longer is not
> > harmful.
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  controller/patch.c      | 15 ++++++++-
> >  tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+), 1 deletion(-)
> >
> > diff --git a/controller/patch.c b/controller/patch.c
> > index ed831302c..9faae5879 100644
> > --- a/controller/patch.c
> > +++ b/controller/patch.c
> > @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >
> >      /* Now 'existing_ports' only still contains patch ports that exist
in the
> >       * database but shouldn't.  Delete them from the database. */
> > +
> > +    /* Wait for some iterations before really deleting any patch
ports, because
> > +     * with conditional monitoring it is possible that related SB data
is not
> > +     * completely downloaded yet after last restart of ovn-controller.
> > +     * Otherwise it may cause unncessary dataplane interruption during
> > +     * restart/upgrade. */
> > +    static int delete_patch_port_delay = 10;
>
> Hi Han,
>
> It's possible that ovn-controller wakes up 10 times in a row immediately
> due to local OVSDB changes and doesn't process any new SB updates in
> that time so 10 might not be enough in some cases.
>

Thanks Dumitru for the review!
In theory I agree with you 10 times is not 100% guaranteeing SB update
completed if other things are triggering the wakeups. However, in practice,
the purpose here is for the start/restart scenario. I think it is very
unlikely that local OVSDB is changing that frequently at the same time when
ovn-controller is being restarted. We have some similar logic (not ideal
for sure) at vif-plug.c:vif_plug_run() for similar reasons.

> Does it make sense to wait a given amount of time instead?  E.g., a few
> seconds?  Can we make this configurable somehow?  Maybe an
> ovn-controller command line argument to override the default?

Waiting for a given amount of time is also not ideal. It is possible that
when ovn-controller starts the SB IDL is not connected (due to server side
problems/ control plane network problems, etc.) so we don't know how long
it should wait at all.

I am ok with adding command line arguments to adjust, but I'd really want
to avoid that unless it is proved to be necessary. I'd rather use a bigger
hardcoded value to avoid another knob which is not easy to understand by
users - it should be something handled by the code itself.

Thanks,
Han

>
> Thanks,
> Dumitru
>
>
Dumitru Ceara June 29, 2022, 8:36 a.m. UTC | #3
On 6/28/22 19:18, Han Zhou wrote:
> On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 6/28/22 02:49, Han Zhou wrote:
>>> When ovn-controller is restarted, it may need multiple iterations of
>>> main loop before completely download all related data from SB DB,
>>> especially when ovn-monitor-all=false, so after restart, before it sees
>>> the related localnet ports from SB DB, it treats the related patch
>>> ports on the chassis as not needed, and deleted them. Later when it
>>> downloads thoses port-bindings it recreates them.  For a graceful
>>> upgrade, we don't this to happen, because it would break the traffic.
>>>
>>> This is especially problematic at ovn-k8s setups because the external
>>> side of the patch port is used to program some flows for external
>>> network communication. When the patch ports are recreated, the OF port
>>> number changes and those flows are broken. The CMS would detect and fix
>>> the flows but it would result in even longer downtime.
>>>
>>> This patch postpone the deletion of the patch ports, with the assumption
>>> that leaving the unused ports on the chassis for little longer is not
>>> harmful.
>>>
>>> Signed-off-by: Han Zhou <hzhou@ovn.org>
>>> ---
>>>  controller/patch.c      | 15 ++++++++-
>>>  tests/ovn-controller.at | 71 +++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 85 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/controller/patch.c b/controller/patch.c
>>> index ed831302c..9faae5879 100644
>>> --- a/controller/patch.c
>>> +++ b/controller/patch.c
>>> @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>>>
>>>      /* Now 'existing_ports' only still contains patch ports that exist
> in the
>>>       * database but shouldn't.  Delete them from the database. */
>>> +
>>> +    /* Wait for some iterations before really deleting any patch
> ports, because
>>> +     * with conditional monitoring it is possible that related SB data
> is not
>>> +     * completely downloaded yet after last restart of ovn-controller.
>>> +     * Otherwise it may cause unncessary dataplane interruption during
>>> +     * restart/upgrade. */
>>> +    static int delete_patch_port_delay = 10;
>>
>> Hi Han,
>>
>> It's possible that ovn-controller wakes up 10 times in a row immediately
>> due to local OVSDB changes and doesn't process any new SB updates in
>> that time so 10 might not be enough in some cases.
>>
> 
> Thanks Dumitru for the review!
> In theory I agree with you 10 times is not 100% guaranteeing SB update
> completed if other things are triggering the wakeups. However, in practice,
> the purpose here is for the start/restart scenario. I think it is very
> unlikely that local OVSDB is changing that frequently at the same time when
> ovn-controller is being restarted. We have some similar logic (not ideal
> for sure) at vif-plug.c:vif_plug_run() for similar reasons.
> 

Ah I didn't know we do that already for vif-plug.  Thanks for pointing
it out!

>> Does it make sense to wait a given amount of time instead?  E.g., a few
>> seconds?  Can we make this configurable somehow?  Maybe an
>> ovn-controller command line argument to override the default?
> 
> Waiting for a given amount of time is also not ideal. It is possible that
> when ovn-controller starts the SB IDL is not connected (due to server side
> problems/ control plane network problems, etc.) so we don't know how long
> it should wait at all.
> 
> I am ok with adding command line arguments to adjust, but I'd really want
> to avoid that unless it is proved to be necessary. I'd rather use a bigger
> hardcoded value to avoid another knob which is not easy to understand by
> users - it should be something handled by the code itself.
> 

I understand your point against additional knobs.  Maybe we don't need
a new one.  What if we do a mixed approach?  We already have the
ovn-controller startup timestamp so we could have a single (hardcoded)
delay counter but also ensure that at least X seconds elapsed.  It might
be a bit over-engineered but what do you think of the following
incremental?

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 69615308e..153f742b1 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -863,11 +863,12 @@ static void
 store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
              const struct sbrec_chassis_private *chassis,
              const struct ovsrec_bridge *br_int,
-             unsigned int delay_nb_cfg_report, int64_t startup_ts)
+             unsigned int delay_nb_cfg_report)
 {
     struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
         ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
     uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
+    int64_t startup_ts = daemon_startup_ts();
 
     if (ovs_txn && br_int
             && startup_ts != smap_get_ullong(&br_int->external_ids,
@@ -3811,7 +3812,6 @@ main(int argc, char *argv[])
     /* Main loop. */
     exiting = false;
     restart = false;
-    int64_t startup_ts = time_wall_msec();
     bool sb_monitor_all = false;
     while (!exiting) {
         memory_run();
@@ -3864,7 +3864,7 @@ main(int argc, char *argv[])
             if (!new_ovnsb_cond_seqno) {
                 VLOG_INFO("OVNSB IDL reconnected, force recompute.");
                 engine_set_force_recompute(true);
-                vif_plug_reset_idl_prime_counter();
+                dbs_connected_record();
             }
             ovnsb_cond_seqno = new_ovnsb_cond_seqno;
         }
@@ -4153,7 +4153,7 @@ main(int argc, char *argv[])
             }
 
             store_nb_cfg(ovnsb_idl_txn, ovs_idl_txn, chassis_private,
-                         br_int, delay_nb_cfg_report, startup_ts);
+                         br_int, delay_nb_cfg_report);
 
             if (pending_pkt.conn) {
                 struct ed_type_addr_sets *as_data =
diff --git a/controller/patch.c b/controller/patch.c
index 9faae5879..d5879c580 100644
--- a/controller/patch.c
+++ b/controller/patch.c
@@ -313,16 +313,12 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
      * completely downloaded yet after last restart of ovn-controller.
      * Otherwise it may cause unncessary dataplane interruption during
      * restart/upgrade. */
-    static int delete_patch_port_delay = 10;
-    if (delete_patch_port_delay > 0) {
-        delete_patch_port_delay--;
-    }
-
+    bool keep_patch_ports = daemon_started_recently();
     struct shash_node *port_node;
     SHASH_FOR_EACH_SAFE (port_node, &existing_ports) {
         port = port_node->data;
         shash_delete(&existing_ports, port_node);
-        if (!delete_patch_port_delay) {
+        if (!keep_patch_ports) {
             remove_port(bridge_table, port);
         }
     }
diff --git a/controller/vif-plug.c b/controller/vif-plug.c
index c6fbe7e59..fc7a72a7d 100644
--- a/controller/vif-plug.c
+++ b/controller/vif-plug.c
@@ -532,22 +532,14 @@ vif_plug_handle_iface(const struct ovsrec_interface *iface_rec,
  * completeness of the initial data downloading we need this counter so that we
  * do not erronously unplug ports because the data is just not loaded yet.
  */
-#define VIF_PLUG_PRIME_IDL_COUNT_SEEED 10
-static int vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
-
-void
-vif_plug_reset_idl_prime_counter(void)
-{
-    vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
-}
-
 void
 vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
              struct vif_plug_ctx_out *vif_plug_ctx_out)
 {
-    if (vif_plug_prime_idl_count && --vif_plug_prime_idl_count > 0) {
-        VLOG_DBG("vif_plug_run: vif_plug_prime_idl_count=%d, will not unplug "
-                 "ports in this iteration.", vif_plug_prime_idl_count);
+    bool delay_plug = daemon_started_recently() || dbs_connected_recently();
+    if (delay_plug) {
+        VLOG_DBG("vif_plug_run: controller recently started, will not unplug "
+                 "ports in this iteration.");
     }
 
     if (!vif_plug_ctx_in->chassis_rec) {
@@ -557,7 +549,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
     OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec,
                                      vif_plug_ctx_in->iface_table) {
         vif_plug_handle_iface(iface_rec, vif_plug_ctx_in, vif_plug_ctx_out,
-                              !vif_plug_prime_idl_count);
+                              !delay_plug);
     }
 
     struct sbrec_port_binding *target =
@@ -573,7 +565,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
         enum en_lport_type lport_type = get_lport_type(pb);
         if (lport_type == LP_VIF) {
             vif_plug_handle_lport_vif(pb, vif_plug_ctx_in, vif_plug_ctx_out,
-                                      !vif_plug_prime_idl_count);
+                                      !delay_plug);
         }
     }
     sbrec_port_binding_index_destroy_row(target);
diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 616999eab..7b859d440 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -883,3 +883,44 @@ get_bridge(const struct ovsrec_bridge_table *bridge_table, const char *br_name)
     }
     return NULL;
 }
+
+static int64_t startup_ts;
+
+OVS_CONSTRUCTOR(startup_ts_initializer) {
+    startup_ts = time_wall_msec();
+}
+
+int64_t
+daemon_startup_ts(void)
+{
+    return startup_ts;
+}
+
+#define DAEMON_STARTUP_DELAY_SEED 10
+#define DAEMON_STARTUP_DELAY_MS   5000
+bool
+daemon_started_recently(void)
+{
+    static int controller_startup_delay = DAEMON_STARTUP_DELAY_SEED;
+
+    if (controller_startup_delay && --controller_startup_delay) {
+        return true;
+    }
+
+    /* Ensure that at least an amount of time has passed. */
+    return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS;
+}
+
+#define DBS_CONNECTED_RECENTLY_DELAY_SEED 10
+static int dbs_connected_recently_delay = DBS_CONNECTED_RECENTLY_DELAY_SEED;
+void
+dbs_connected_record(void)
+{
+    dbs_connected_recently_delay = DBS_CONNECTED_RECENTLY_DELAY_SEED;
+}
+
+bool
+dbs_connected_recently(void)
+{
+    return dbs_connected_recently_delay && --dbs_connected_recently_delay;
+}
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index b3905ef7b..ed6b345a8 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -309,5 +309,9 @@ struct ovsrec_bridge_table;
 const struct ovsrec_bridge *get_bridge(const struct ovsrec_bridge_table *,
                                        const char *br_name);
 
+int64_t daemon_startup_ts(void);
+bool daemon_started_recently(void);
+void dbs_connected_record(void);
+bool dbs_connected_recently(void);
 
 #endif /* OVN_UTIL_H */
Han Zhou June 29, 2022, 7:49 p.m. UTC | #4
On Wed, Jun 29, 2022 at 1:37 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 6/28/22 19:18, Han Zhou wrote:
> > On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 6/28/22 02:49, Han Zhou wrote:
> >>> When ovn-controller is restarted, it may need multiple iterations of
> >>> main loop before completely download all related data from SB DB,
> >>> especially when ovn-monitor-all=false, so after restart, before it
sees
> >>> the related localnet ports from SB DB, it treats the related patch
> >>> ports on the chassis as not needed, and deleted them. Later when it
> >>> downloads thoses port-bindings it recreates them.  For a graceful
> >>> upgrade, we don't this to happen, because it would break the traffic.
> >>>
> >>> This is especially problematic at ovn-k8s setups because the external
> >>> side of the patch port is used to program some flows for external
> >>> network communication. When the patch ports are recreated, the OF port
> >>> number changes and those flows are broken. The CMS would detect and
fix
> >>> the flows but it would result in even longer downtime.
> >>>
> >>> This patch postpone the deletion of the patch ports, with the
assumption
> >>> that leaving the unused ports on the chassis for little longer is not
> >>> harmful.
> >>>
> >>> Signed-off-by: Han Zhou <hzhou@ovn.org>
> >>> ---
> >>>  controller/patch.c      | 15 ++++++++-
> >>>  tests/ovn-controller.at | 71
+++++++++++++++++++++++++++++++++++++++++
> >>>  2 files changed, 85 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/controller/patch.c b/controller/patch.c
> >>> index ed831302c..9faae5879 100644
> >>> --- a/controller/patch.c
> >>> +++ b/controller/patch.c
> >>> @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
> >>>
> >>>      /* Now 'existing_ports' only still contains patch ports that
exist
> > in the
> >>>       * database but shouldn't.  Delete them from the database. */
> >>> +
> >>> +    /* Wait for some iterations before really deleting any patch
> > ports, because
> >>> +     * with conditional monitoring it is possible that related SB
data
> > is not
> >>> +     * completely downloaded yet after last restart of
ovn-controller.
> >>> +     * Otherwise it may cause unncessary dataplane interruption
during
> >>> +     * restart/upgrade. */
> >>> +    static int delete_patch_port_delay = 10;
> >>
> >> Hi Han,
> >>
> >> It's possible that ovn-controller wakes up 10 times in a row
immediately
> >> due to local OVSDB changes and doesn't process any new SB updates in
> >> that time so 10 might not be enough in some cases.
> >>
> >
> > Thanks Dumitru for the review!
> > In theory I agree with you 10 times is not 100% guaranteeing SB update
> > completed if other things are triggering the wakeups. However, in
practice,
> > the purpose here is for the start/restart scenario. I think it is very
> > unlikely that local OVSDB is changing that frequently at the same time
when
> > ovn-controller is being restarted. We have some similar logic (not ideal
> > for sure) at vif-plug.c:vif_plug_run() for similar reasons.
> >
>
> Ah I didn't know we do that already for vif-plug.  Thanks for pointing
> it out!
>
> >> Does it make sense to wait a given amount of time instead?  E.g., a few
> >> seconds?  Can we make this configurable somehow?  Maybe an
> >> ovn-controller command line argument to override the default?
> >
> > Waiting for a given amount of time is also not ideal. It is possible
that
> > when ovn-controller starts the SB IDL is not connected (due to server
side
> > problems/ control plane network problems, etc.) so we don't know how
long
> > it should wait at all.
> >
> > I am ok with adding command line arguments to adjust, but I'd really
want
> > to avoid that unless it is proved to be necessary. I'd rather use a
bigger
> > hardcoded value to avoid another knob which is not easy to understand by
> > users - it should be something handled by the code itself.
> >
>
> I understand your point against additional knobs.  Maybe we don't need
> a new one.  What if we do a mixed approach?  We already have the
> ovn-controller startup timestamp so we could have a single (hardcoded)
> delay counter but also ensure that at least X seconds elapsed.  It might
> be a bit over-engineered but what do you think of the following
> incremental?
>

Thanks Dumitru. I am ok with adding a check for time passed in addition to
the iterations. The function daemon_started_recently() you added below
would have a problem because each caller of this function would trigger the
decrement of the controller_startup_delay, so if there are 10 callers it
would decrease to 0 with just one iteration. But the overall approach
demonstrated looks good. I will work on V2 with your idea incorporated.

In addition, it doesn't seem necessary to me for the vif_plug to reset the
counter when DB is reconnected, because I don't expect it to lose old data
in such cases - the monitor condition remains unchanged. So I think
dbs_connected_recently() is not really needed. I will probably touch the
vif_plug logic in a separate patch.

Thanks,
Han

> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 69615308e..153f742b1 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -863,11 +863,12 @@ static void
>  store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
>               const struct sbrec_chassis_private *chassis,
>               const struct ovsrec_bridge *br_int,
> -             unsigned int delay_nb_cfg_report, int64_t startup_ts)
> +             unsigned int delay_nb_cfg_report)
>  {
>      struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
>          ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
>      uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
> +    int64_t startup_ts = daemon_startup_ts();
>
>      if (ovs_txn && br_int
>              && startup_ts != smap_get_ullong(&br_int->external_ids,
> @@ -3811,7 +3812,6 @@ main(int argc, char *argv[])
>      /* Main loop. */
>      exiting = false;
>      restart = false;
> -    int64_t startup_ts = time_wall_msec();
>      bool sb_monitor_all = false;
>      while (!exiting) {
>          memory_run();
> @@ -3864,7 +3864,7 @@ main(int argc, char *argv[])
>              if (!new_ovnsb_cond_seqno) {
>                  VLOG_INFO("OVNSB IDL reconnected, force recompute.");
>                  engine_set_force_recompute(true);
> -                vif_plug_reset_idl_prime_counter();
> +                dbs_connected_record();
>              }
>              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>          }
> @@ -4153,7 +4153,7 @@ main(int argc, char *argv[])
>              }
>
>              store_nb_cfg(ovnsb_idl_txn, ovs_idl_txn, chassis_private,
> -                         br_int, delay_nb_cfg_report, startup_ts);
> +                         br_int, delay_nb_cfg_report);
>
>              if (pending_pkt.conn) {
>                  struct ed_type_addr_sets *as_data =
> diff --git a/controller/patch.c b/controller/patch.c
> index 9faae5879..d5879c580 100644
> --- a/controller/patch.c
> +++ b/controller/patch.c
> @@ -313,16 +313,12 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>       * completely downloaded yet after last restart of ovn-controller.
>       * Otherwise it may cause unncessary dataplane interruption during
>       * restart/upgrade. */
> -    static int delete_patch_port_delay = 10;
> -    if (delete_patch_port_delay > 0) {
> -        delete_patch_port_delay--;
> -    }
> -
> +    bool keep_patch_ports = daemon_started_recently();
>      struct shash_node *port_node;
>      SHASH_FOR_EACH_SAFE (port_node, &existing_ports) {
>          port = port_node->data;
>          shash_delete(&existing_ports, port_node);
> -        if (!delete_patch_port_delay) {
> +        if (!keep_patch_ports) {
>              remove_port(bridge_table, port);
>          }
>      }
> diff --git a/controller/vif-plug.c b/controller/vif-plug.c
> index c6fbe7e59..fc7a72a7d 100644
> --- a/controller/vif-plug.c
> +++ b/controller/vif-plug.c
> @@ -532,22 +532,14 @@ vif_plug_handle_iface(const struct ovsrec_interface
*iface_rec,
>   * completeness of the initial data downloading we need this counter so
that we
>   * do not erronously unplug ports because the data is just not loaded
yet.
>   */
> -#define VIF_PLUG_PRIME_IDL_COUNT_SEEED 10
> -static int vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
> -
> -void
> -vif_plug_reset_idl_prime_counter(void)
> -{
> -    vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
> -}
> -
>  void
>  vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>               struct vif_plug_ctx_out *vif_plug_ctx_out)
>  {
> -    if (vif_plug_prime_idl_count && --vif_plug_prime_idl_count > 0) {
> -        VLOG_DBG("vif_plug_run: vif_plug_prime_idl_count=%d, will not
unplug "
> -                 "ports in this iteration.", vif_plug_prime_idl_count);
> +    bool delay_plug = daemon_started_recently() ||
dbs_connected_recently();
> +    if (delay_plug) {
> +        VLOG_DBG("vif_plug_run: controller recently started, will not
unplug "
> +                 "ports in this iteration.");
>      }
>
>      if (!vif_plug_ctx_in->chassis_rec) {
> @@ -557,7 +549,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>      OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec,
>                                       vif_plug_ctx_in->iface_table) {
>          vif_plug_handle_iface(iface_rec, vif_plug_ctx_in,
vif_plug_ctx_out,
> -                              !vif_plug_prime_idl_count);
> +                              !delay_plug);
>      }
>
>      struct sbrec_port_binding *target =
> @@ -573,7 +565,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>          enum en_lport_type lport_type = get_lport_type(pb);
>          if (lport_type == LP_VIF) {
>              vif_plug_handle_lport_vif(pb, vif_plug_ctx_in,
vif_plug_ctx_out,
> -                                      !vif_plug_prime_idl_count);
> +                                      !delay_plug);
>          }
>      }
>      sbrec_port_binding_index_destroy_row(target);
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 616999eab..7b859d440 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -883,3 +883,44 @@ get_bridge(const struct ovsrec_bridge_table
*bridge_table, const char *br_name)
>      }
>      return NULL;
>  }
> +
> +static int64_t startup_ts;
> +
> +OVS_CONSTRUCTOR(startup_ts_initializer) {
> +    startup_ts = time_wall_msec();
> +}
> +
> +int64_t
> +daemon_startup_ts(void)
> +{
> +    return startup_ts;
> +}
> +
> +#define DAEMON_STARTUP_DELAY_SEED 10
> +#define DAEMON_STARTUP_DELAY_MS   5000
> +bool
> +daemon_started_recently(void)
> +{
> +    static int controller_startup_delay = DAEMON_STARTUP_DELAY_SEED;
> +
> +    if (controller_startup_delay && --controller_startup_delay) {
> +        return true;
> +    }
> +
> +    /* Ensure that at least an amount of time has passed. */
> +    return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS;
> +}
> +
> +#define DBS_CONNECTED_RECENTLY_DELAY_SEED 10
> +static int dbs_connected_recently_delay =
DBS_CONNECTED_RECENTLY_DELAY_SEED;
> +void
> +dbs_connected_record(void)
> +{
> +    dbs_connected_recently_delay = DBS_CONNECTED_RECENTLY_DELAY_SEED;
> +}
> +
> +bool
> +dbs_connected_recently(void)
> +{
> +    return dbs_connected_recently_delay &&
--dbs_connected_recently_delay;
> +}
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index b3905ef7b..ed6b345a8 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -309,5 +309,9 @@ struct ovsrec_bridge_table;
>  const struct ovsrec_bridge *get_bridge(const struct ovsrec_bridge_table
*,
>                                         const char *br_name);
>
> +int64_t daemon_startup_ts(void);
> +bool daemon_started_recently(void);
> +void dbs_connected_record(void);
> +bool dbs_connected_recently(void);
>
>  #endif /* OVN_UTIL_H */
>
Dumitru Ceara June 30, 2022, 7:39 a.m. UTC | #5
On 6/29/22 21:49, Han Zhou wrote:
> On Wed, Jun 29, 2022 at 1:37 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 6/28/22 19:18, Han Zhou wrote:
>>> On Tue, Jun 28, 2022 at 8:20 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 6/28/22 02:49, Han Zhou wrote:
>>>>> When ovn-controller is restarted, it may need multiple iterations of
>>>>> main loop before completely download all related data from SB DB,
>>>>> especially when ovn-monitor-all=false, so after restart, before it
> sees
>>>>> the related localnet ports from SB DB, it treats the related patch
>>>>> ports on the chassis as not needed, and deleted them. Later when it
>>>>> downloads thoses port-bindings it recreates them.  For a graceful
>>>>> upgrade, we don't this to happen, because it would break the traffic.
>>>>>
>>>>> This is especially problematic at ovn-k8s setups because the external
>>>>> side of the patch port is used to program some flows for external
>>>>> network communication. When the patch ports are recreated, the OF port
>>>>> number changes and those flows are broken. The CMS would detect and
> fix
>>>>> the flows but it would result in even longer downtime.
>>>>>
>>>>> This patch postpone the deletion of the patch ports, with the
> assumption
>>>>> that leaving the unused ports on the chassis for little longer is not
>>>>> harmful.
>>>>>
>>>>> Signed-off-by: Han Zhou <hzhou@ovn.org>
>>>>> ---
>>>>>  controller/patch.c      | 15 ++++++++-
>>>>>  tests/ovn-controller.at | 71
> +++++++++++++++++++++++++++++++++++++++++
>>>>>  2 files changed, 85 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/controller/patch.c b/controller/patch.c
>>>>> index ed831302c..9faae5879 100644
>>>>> --- a/controller/patch.c
>>>>> +++ b/controller/patch.c
>>>>> @@ -307,11 +307,24 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>>>>>
>>>>>      /* Now 'existing_ports' only still contains patch ports that
> exist
>>> in the
>>>>>       * database but shouldn't.  Delete them from the database. */
>>>>> +
>>>>> +    /* Wait for some iterations before really deleting any patch
>>> ports, because
>>>>> +     * with conditional monitoring it is possible that related SB
> data
>>> is not
>>>>> +     * completely downloaded yet after last restart of
> ovn-controller.
>>>>> +     * Otherwise it may cause unncessary dataplane interruption
> during
>>>>> +     * restart/upgrade. */
>>>>> +    static int delete_patch_port_delay = 10;
>>>>
>>>> Hi Han,
>>>>
>>>> It's possible that ovn-controller wakes up 10 times in a row
> immediately
>>>> due to local OVSDB changes and doesn't process any new SB updates in
>>>> that time so 10 might not be enough in some cases.
>>>>
>>>
>>> Thanks Dumitru for the review!
>>> In theory I agree with you 10 times is not 100% guaranteeing SB update
>>> completed if other things are triggering the wakeups. However, in
> practice,
>>> the purpose here is for the start/restart scenario. I think it is very
>>> unlikely that local OVSDB is changing that frequently at the same time
> when
>>> ovn-controller is being restarted. We have some similar logic (not ideal
>>> for sure) at vif-plug.c:vif_plug_run() for similar reasons.
>>>
>>
>> Ah I didn't know we do that already for vif-plug.  Thanks for pointing
>> it out!
>>
>>>> Does it make sense to wait a given amount of time instead?  E.g., a few
>>>> seconds?  Can we make this configurable somehow?  Maybe an
>>>> ovn-controller command line argument to override the default?
>>>
>>> Waiting for a given amount of time is also not ideal. It is possible
> that
>>> when ovn-controller starts the SB IDL is not connected (due to server
> side
>>> problems/ control plane network problems, etc.) so we don't know how
> long
>>> it should wait at all.
>>>
>>> I am ok with adding command line arguments to adjust, but I'd really
> want
>>> to avoid that unless it is proved to be necessary. I'd rather use a
> bigger
>>> hardcoded value to avoid another knob which is not easy to understand by
>>> users - it should be something handled by the code itself.
>>>
>>
>> I understand your point against additional knobs.  Maybe we don't need
>> a new one.  What if we do a mixed approach?  We already have the
>> ovn-controller startup timestamp so we could have a single (hardcoded)
>> delay counter but also ensure that at least X seconds elapsed.  It might
>> be a bit over-engineered but what do you think of the following
>> incremental?
>>
> 
> Thanks Dumitru. I am ok with adding a check for time passed in addition to
> the iterations. The function daemon_started_recently() you added below
> would have a problem because each caller of this function would trigger the
> decrement of the controller_startup_delay, so if there are 10 callers it

Oops, you're right.

> would decrease to 0 with just one iteration. But the overall approach
> demonstrated looks good. I will work on V2 with your idea incorporated.
> 

Cool.

> In addition, it doesn't seem necessary to me for the vif_plug to reset the
> counter when DB is reconnected, because I don't expect it to lose old data
> in such cases - the monitor condition remains unchanged. So I think
> dbs_connected_recently() is not really needed. I will probably touch the
> vif_plug logic in a separate patch.

I didn't spend too much time on understanding why vif_plug cared about
DB reconnects but now after you mentioned it it does sound like it's not
really needed.

Thanks,
Dumitru

> 
> Thanks,
> Han
> 
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 69615308e..153f742b1 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -863,11 +863,12 @@ static void
>>  store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
>>               const struct sbrec_chassis_private *chassis,
>>               const struct ovsrec_bridge *br_int,
>> -             unsigned int delay_nb_cfg_report, int64_t startup_ts)
>> +             unsigned int delay_nb_cfg_report)
>>  {
>>      struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
>>          ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
>>      uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
>> +    int64_t startup_ts = daemon_startup_ts();
>>
>>      if (ovs_txn && br_int
>>              && startup_ts != smap_get_ullong(&br_int->external_ids,
>> @@ -3811,7 +3812,6 @@ main(int argc, char *argv[])
>>      /* Main loop. */
>>      exiting = false;
>>      restart = false;
>> -    int64_t startup_ts = time_wall_msec();
>>      bool sb_monitor_all = false;
>>      while (!exiting) {
>>          memory_run();
>> @@ -3864,7 +3864,7 @@ main(int argc, char *argv[])
>>              if (!new_ovnsb_cond_seqno) {
>>                  VLOG_INFO("OVNSB IDL reconnected, force recompute.");
>>                  engine_set_force_recompute(true);
>> -                vif_plug_reset_idl_prime_counter();
>> +                dbs_connected_record();
>>              }
>>              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>>          }
>> @@ -4153,7 +4153,7 @@ main(int argc, char *argv[])
>>              }
>>
>>              store_nb_cfg(ovnsb_idl_txn, ovs_idl_txn, chassis_private,
>> -                         br_int, delay_nb_cfg_report, startup_ts);
>> +                         br_int, delay_nb_cfg_report);
>>
>>              if (pending_pkt.conn) {
>>                  struct ed_type_addr_sets *as_data =
>> diff --git a/controller/patch.c b/controller/patch.c
>> index 9faae5879..d5879c580 100644
>> --- a/controller/patch.c
>> +++ b/controller/patch.c
>> @@ -313,16 +313,12 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>>       * completely downloaded yet after last restart of ovn-controller.
>>       * Otherwise it may cause unncessary dataplane interruption during
>>       * restart/upgrade. */
>> -    static int delete_patch_port_delay = 10;
>> -    if (delete_patch_port_delay > 0) {
>> -        delete_patch_port_delay--;
>> -    }
>> -
>> +    bool keep_patch_ports = daemon_started_recently();
>>      struct shash_node *port_node;
>>      SHASH_FOR_EACH_SAFE (port_node, &existing_ports) {
>>          port = port_node->data;
>>          shash_delete(&existing_ports, port_node);
>> -        if (!delete_patch_port_delay) {
>> +        if (!keep_patch_ports) {
>>              remove_port(bridge_table, port);
>>          }
>>      }
>> diff --git a/controller/vif-plug.c b/controller/vif-plug.c
>> index c6fbe7e59..fc7a72a7d 100644
>> --- a/controller/vif-plug.c
>> +++ b/controller/vif-plug.c
>> @@ -532,22 +532,14 @@ vif_plug_handle_iface(const struct ovsrec_interface
> *iface_rec,
>>   * completeness of the initial data downloading we need this counter so
> that we
>>   * do not erronously unplug ports because the data is just not loaded
> yet.
>>   */
>> -#define VIF_PLUG_PRIME_IDL_COUNT_SEEED 10
>> -static int vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
>> -
>> -void
>> -vif_plug_reset_idl_prime_counter(void)
>> -{
>> -    vif_plug_prime_idl_count = VIF_PLUG_PRIME_IDL_COUNT_SEEED;
>> -}
>> -
>>  void
>>  vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>>               struct vif_plug_ctx_out *vif_plug_ctx_out)
>>  {
>> -    if (vif_plug_prime_idl_count && --vif_plug_prime_idl_count > 0) {
>> -        VLOG_DBG("vif_plug_run: vif_plug_prime_idl_count=%d, will not
> unplug "
>> -                 "ports in this iteration.", vif_plug_prime_idl_count);
>> +    bool delay_plug = daemon_started_recently() ||
> dbs_connected_recently();
>> +    if (delay_plug) {
>> +        VLOG_DBG("vif_plug_run: controller recently started, will not
> unplug "
>> +                 "ports in this iteration.");
>>      }
>>
>>      if (!vif_plug_ctx_in->chassis_rec) {
>> @@ -557,7 +549,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>>      OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec,
>>                                       vif_plug_ctx_in->iface_table) {
>>          vif_plug_handle_iface(iface_rec, vif_plug_ctx_in,
> vif_plug_ctx_out,
>> -                              !vif_plug_prime_idl_count);
>> +                              !delay_plug);
>>      }
>>
>>      struct sbrec_port_binding *target =
>> @@ -573,7 +565,7 @@ vif_plug_run(struct vif_plug_ctx_in *vif_plug_ctx_in,
>>          enum en_lport_type lport_type = get_lport_type(pb);
>>          if (lport_type == LP_VIF) {
>>              vif_plug_handle_lport_vif(pb, vif_plug_ctx_in,
> vif_plug_ctx_out,
>> -                                      !vif_plug_prime_idl_count);
>> +                                      !delay_plug);
>>          }
>>      }
>>      sbrec_port_binding_index_destroy_row(target);
>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>> index 616999eab..7b859d440 100644
>> --- a/lib/ovn-util.c
>> +++ b/lib/ovn-util.c
>> @@ -883,3 +883,44 @@ get_bridge(const struct ovsrec_bridge_table
> *bridge_table, const char *br_name)
>>      }
>>      return NULL;
>>  }
>> +
>> +static int64_t startup_ts;
>> +
>> +OVS_CONSTRUCTOR(startup_ts_initializer) {
>> +    startup_ts = time_wall_msec();
>> +}
>> +
>> +int64_t
>> +daemon_startup_ts(void)
>> +{
>> +    return startup_ts;
>> +}
>> +
>> +#define DAEMON_STARTUP_DELAY_SEED 10
>> +#define DAEMON_STARTUP_DELAY_MS   5000
>> +bool
>> +daemon_started_recently(void)
>> +{
>> +    static int controller_startup_delay = DAEMON_STARTUP_DELAY_SEED;
>> +
>> +    if (controller_startup_delay && --controller_startup_delay) {
>> +        return true;
>> +    }
>> +
>> +    /* Ensure that at least an amount of time has passed. */
>> +    return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS;
>> +}
>> +
>> +#define DBS_CONNECTED_RECENTLY_DELAY_SEED 10
>> +static int dbs_connected_recently_delay =
> DBS_CONNECTED_RECENTLY_DELAY_SEED;
>> +void
>> +dbs_connected_record(void)
>> +{
>> +    dbs_connected_recently_delay = DBS_CONNECTED_RECENTLY_DELAY_SEED;
>> +}
>> +
>> +bool
>> +dbs_connected_recently(void)
>> +{
>> +    return dbs_connected_recently_delay &&
> --dbs_connected_recently_delay;
>> +}
>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>> index b3905ef7b..ed6b345a8 100644
>> --- a/lib/ovn-util.h
>> +++ b/lib/ovn-util.h
>> @@ -309,5 +309,9 @@ struct ovsrec_bridge_table;
>>  const struct ovsrec_bridge *get_bridge(const struct ovsrec_bridge_table
> *,
>>                                         const char *br_name);
>>
>> +int64_t daemon_startup_ts(void);
>> +bool daemon_started_recently(void);
>> +void dbs_connected_record(void);
>> +bool dbs_connected_recently(void);
>>
>>  #endif /* OVN_UTIL_H */
>>
>
diff mbox series

Patch

diff --git a/controller/patch.c b/controller/patch.c
index ed831302c..9faae5879 100644
--- a/controller/patch.c
+++ b/controller/patch.c
@@ -307,11 +307,24 @@  patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
 
     /* Now 'existing_ports' only still contains patch ports that exist in the
      * database but shouldn't.  Delete them from the database. */
+
+    /* Wait for some iterations before really deleting any patch ports, because
+     * with conditional monitoring it is possible that related SB data is not
+     * completely downloaded yet after last restart of ovn-controller.
+     * Otherwise it may cause unncessary dataplane interruption during
+     * restart/upgrade. */
+    static int delete_patch_port_delay = 10;
+    if (delete_patch_port_delay > 0) {
+        delete_patch_port_delay--;
+    }
+
     struct shash_node *port_node;
     SHASH_FOR_EACH_SAFE (port_node, &existing_ports) {
         port = port_node->data;
         shash_delete(&existing_ports, port_node);
-        remove_port(bridge_table, port);
+        if (!delete_patch_port_delay) {
+            remove_port(bridge_table, port);
+        }
     }
     shash_destroy(&existing_ports);
 }
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index b8db342b9..a3a9122ab 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2242,3 +2242,74 @@  OVS_WAIT_UNTIL([test 6 = $(as hv1 ovs-ofctl dump-flows br-int | grep -c $(port_b
 
 OVN_CLEANUP([hv1])
 AT_CLEANUP
+
+AT_SETUP([ovn-controller - restart should not delete patch ports])
+
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+check ovs-vsctl add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=lsp1
+check ovs-vsctl add-br br-ext -- \
+    set open . external-ids:ovn-bridge-mappings=extnet:br-ext
+
+# Create logical topology so that lsp1 has multiple hops to a localnet port,
+# which would require multiple iterations to download the related datapaths and
+# port_bindings from SB DB during startup.
+#
+# lsp1@hv1 -- ls1 -- lr1 -- ls2 -- lr2 -- ls-ext -- lsp-ext (localnet)
+
+check ovn-appctl -t ovn-controller vlog/set file:dbg
+check ovn-nbctl ls-add ls1 \
+    -- lsp-add ls1 lsp1 \
+    -- lsp-set-addresses lsp1 "f0:00:00:00:00:01 10.0.1.2"
+check ovn-nbctl lr-add lr1 \
+    -- lrp-add lr1 lrp-lr1-ls1 f0:00:aa:00:01:01 10.0.1.1/24 \
+    -- lsp-add ls1 lsp-ls1-lr1 \
+    -- lsp-set-type lsp-ls1-lr1 router \
+    -- lsp-set-options lsp-ls1-lr1 router-port=lrp-lr1-ls1 \
+    -- lsp-set-addresses lsp-ls1-lr1 router
+check ovn-nbctl ls-add ls2 \
+    -- lrp-add lr1 lrp-lr1-ls2 f0:00:aa:00:01:02 10.0.2.1/24 \
+    -- lsp-add ls2 lsp-ls2-lr1 \
+    -- lsp-set-type lsp-ls2-lr1 router \
+    -- lsp-set-options lsp-ls2-lr1 router-port=lrp-lr1-ls2 \
+    -- lsp-set-addresses lsp-ls2-lr1 router
+check ovn-nbctl lr-add lr2 \
+    -- lrp-add lr2 lrp-lr2-ls2 f0:00:aa:00:02:02 10.0.2.2/24 \
+    -- lsp-add ls2 lsp-ls2-lr2 \
+    -- lsp-set-type lsp-ls2-lr2 router \
+    -- lsp-set-options lsp-ls2-lr2 router-port=lrp-lr2-ls2 \
+    -- lsp-set-addresses lsp-ls2-lr2 router
+check ovn-nbctl ls-add ls-ext \
+    -- lrp-add lr2 lrp-lr2-ext f0:00:aa:00:02:03 10.0.3.1/24 \
+    -- lsp-add ls-ext lsp-ext-lr2 \
+    -- lsp-set-type lsp-ext-lr2 router \
+    -- lsp-set-options lsp-ext-lr2 router-port=lrp-lr2-ext \
+    -- lsp-set-addresses lsp-ext-lr2 router
+check ovn-nbctl lsp-add ls-ext lsp-ext \
+    -- lsp-set-type lsp-ext localnet \
+    -- lsp-set-options lsp-ext network_name=extnet \
+    -- lsp-set-addresses lsp-ext unknown
+check ovn-nbctl --wait=hv sync
+
+AT_CHECK([ovs-vsctl list-ports br-int | grep patch-br-int-to-lsp-ext], [0], [ignore])
+
+# Stop ovn-controller
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+# Start ovn-controller, which shouldn't cause any patch interface
+# deletion/recreation
+start_daemon ovn-controller
+for i in $(seq 20); do
+    check ovn-nbctl --wait=hv sync
+done
+
+AT_CHECK([grep "deleted interface patch" hv1/ovs-vswitchd.log], [1], [ignore])
+OVN_CLEANUP([hv1])
+AT_CLEANUP
+