diff mbox series

[ovs-dev,v4] northd: Allow delay of northd engine runs

Message ID 20230824052944.7905-1-amusil@redhat.com
State Accepted
Headers show
Series [ovs-dev,v4] northd: Allow delay of northd engine runs | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Ales Musil Aug. 24, 2023, 5:29 a.m. UTC
Add config option called "northd-backoff-interval-ms" that allows
to delay northd engine runs capped by the config option.
When the config option is set to 0 or unspecified, the engine
will run without any restrictions. If the value >0 we will delay
northd engine run by the previous run time capped by the
config option.

The reason to delay northd is to prevent it from consuming 100%
CPU all the time, secondary to that is the batch of NB updates
that could be processed in single northd run. With the recent
changes to I-P the engine processing updates faster, but not
quite fast enough for processing to be faster than NB changes,
which in turn can result into northd never going to sleep. With
the backoff period enabled northd can sleep and process the DB
updates in bigger batch.

In addition to process the updates as fast as possible wake the
northd immediately if there are changes accumulated over period
of 500 ms or bigger.

The results are very noticeable during scale testing.
Run without any backoff period:
northd aggregate CPU 9810% avg / 12765% max
northd was spinning at 100% CPU the entire second half of the test.

Run with 200 ms max backoff period:
northd aggregate CPU 6066% avg / 7689% max
northd was around 60% for the second half of the test.

One thing to note is that the overall latency was slightly
increased from P99 4s to P99 4.1s.

Signed-off-by: Ales Musil <amusil@redhat.com>
---
v4: Skip the northd engine delay if there is more changes accumulated in the IDL loop.
---
 NEWS                     |  2 ++
 northd/inc-proc-northd.c | 27 +++++++++++++++++++---
 northd/inc-proc-northd.h | 13 ++++++++++-
 northd/ovn-northd.c      | 48 ++++++++++++++++++++++++----------------
 ovn-nb.xml               |  9 ++++++++
 5 files changed, 76 insertions(+), 23 deletions(-)

Comments

Mark Michelson Aug. 25, 2023, 5:31 p.m. UTC | #1
Thanks for the changes Ales.

Acked-by: Mark Michelson <mmichels@redhat.com>

On 8/24/23 01:29, Ales Musil wrote:
> Add config option called "northd-backoff-interval-ms" that allows
> to delay northd engine runs capped by the config option.
> When the config option is set to 0 or unspecified, the engine
> will run without any restrictions. If the value >0 we will delay
> northd engine run by the previous run time capped by the
> config option.
> 
> The reason to delay northd is to prevent it from consuming 100%
> CPU all the time, secondary to that is the batch of NB updates
> that could be processed in single northd run. With the recent
> changes to I-P the engine processing updates faster, but not
> quite fast enough for processing to be faster than NB changes,
> which in turn can result into northd never going to sleep. With
> the backoff period enabled northd can sleep and process the DB
> updates in bigger batch.
> 
> In addition to process the updates as fast as possible wake the
> northd immediately if there are changes accumulated over period
> of 500 ms or bigger.
> 
> The results are very noticeable during scale testing.
> Run without any backoff period:
> northd aggregate CPU 9810% avg / 12765% max
> northd was spinning at 100% CPU the entire second half of the test.
> 
> Run with 200 ms max backoff period:
> northd aggregate CPU 6066% avg / 7689% max
> northd was around 60% for the second half of the test.
> 
> One thing to note is that the overall latency was slightly
> increased from P99 4s to P99 4.1s.
> 
> Signed-off-by: Ales Musil <amusil@redhat.com>
> ---
> v4: Skip the northd engine delay if there is more changes accumulated in the IDL loop.
> ---
>   NEWS                     |  2 ++
>   northd/inc-proc-northd.c | 27 +++++++++++++++++++---
>   northd/inc-proc-northd.h | 13 ++++++++++-
>   northd/ovn-northd.c      | 48 ++++++++++++++++++++++++----------------
>   ovn-nb.xml               |  9 ++++++++
>   5 files changed, 76 insertions(+), 23 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 93d4bedcd..0667e7f94 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,8 @@ Post v23.06.0
>       Existing sessions might get re-hashed to a different ECMP path when
>       OVN detects the algorithm support in the datapath during an upgrade
>       or restart of ovn-controller.
> +  - Add "northd-backoff-interval-ms" config option to delay northd engine
> +    runs capped at the set value.
>   
>   OVN v23.06.0 - 01 Jun 2023
>   --------------------------
> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> index d328deb22..0b759ae1d 100644
> --- a/northd/inc-proc-northd.c
> +++ b/northd/inc-proc-northd.c
> @@ -295,16 +295,18 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
>   /* Returns true if the incremental processing ended up updating nodes. */
>   bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
>                            struct ovsdb_idl_txn *ovnsb_txn,
> -                         bool recompute) {
> +                         struct northd_engine_context *ctx) {
>       ovs_assert(ovnnb_txn && ovnsb_txn);
> +
> +    int64_t start = time_msec();
>       engine_init_run();
>   
>       /* Force a full recompute if instructed to, for example, after a NB/SB
>        * reconnect event.  However, make sure we don't overwrite an existing
>        * force-recompute request if 'recompute' is false.
>        */
> -    if (recompute) {
> -        engine_set_force_recompute(recompute);
> +    if (ctx->recompute) {
> +        engine_set_force_recompute(ctx->recompute);
>       }
>   
>       struct engine_context eng_ctx = {
> @@ -330,6 +332,12 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
>       } else {
>           engine_set_force_recompute(false);
>       }
> +
> +    int64_t now = time_msec();
> +    /* Postpone the next run by length of current run with maximum capped
> +     * by "northd-backoff-interval-ms" interval. */
> +    ctx->next_run_ms = now + MIN(now - start, ctx->backoff_ms);
> +
>       return engine_has_updated();
>   }
>   
> @@ -339,6 +347,19 @@ void inc_proc_northd_cleanup(void)
>       engine_set_context(NULL);
>   }
>   
> +bool
> +inc_proc_northd_can_run(struct northd_engine_context *ctx)
> +{
> +    if (ctx->recompute || time_msec() >= ctx->next_run_ms ||
> +        ctx->nb_idl_duration_ms >= IDL_LOOP_MAX_DURATION_MS ||
> +        ctx->sb_idl_duration_ms >= IDL_LOOP_MAX_DURATION_MS) {
> +        return true;
> +    }
> +
> +    poll_timer_wait_until(ctx->next_run_ms);
> +    return false;
> +}
> +
>   static void
>   chassis_features_list(struct unixctl_conn *conn, int argc OVS_UNUSED,
>                         const char *argv[] OVS_UNUSED, void *features_)
> diff --git a/northd/inc-proc-northd.h b/northd/inc-proc-northd.h
> index 9b81c7ee0..a2b9b7fdb 100644
> --- a/northd/inc-proc-northd.h
> +++ b/northd/inc-proc-northd.h
> @@ -6,11 +6,22 @@
>   #include "northd.h"
>   #include "ovsdb-idl.h"
>   
> +#define IDL_LOOP_MAX_DURATION_MS 500
> +
> +struct northd_engine_context {
> +    int64_t next_run_ms;
> +    uint64_t nb_idl_duration_ms;
> +    uint64_t sb_idl_duration_ms;
> +    uint32_t backoff_ms;
> +    bool recompute;
> +};
> +
>   void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
>                             struct ovsdb_idl_loop *sb);
>   bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
>                            struct ovsdb_idl_txn *ovnsb_txn,
> -                         bool recompute);
> +                         struct northd_engine_context *ctx);
>   void inc_proc_northd_cleanup(void);
> +bool inc_proc_northd_can_run(struct northd_engine_context *ctx);
>   
>   #endif /* INC_PROC_NORTHD */
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 4fa1b039e..ddb8c35e7 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -684,7 +684,8 @@ update_ssl_config(void)
>   }
>   
>   static struct ovsdb_idl_txn *
> -run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
> +run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name,
> +             uint64_t *idl_duration)
>   {
>       unsigned long long duration, start = time_msec();
>       unsigned int seqno = UINT_MAX;
> @@ -692,9 +693,9 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
>       int n = 0;
>   
>       /* Accumulate database changes as long as there are some,
> -     * but no longer than half a second. */
> +     * but no longer than "IDL_LOOP_MAX_DURATION_MS". */
>       while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
> -           && time_msec() - start < 500) {
> +           && time_msec() - start < IDL_LOOP_MAX_DURATION_MS) {
>           seqno = ovsdb_idl_get_seqno(idl_loop->idl);
>           ovsdb_idl_run(idl_loop->idl);
>           n++;
> @@ -703,13 +704,15 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
>       txn = ovsdb_idl_loop_run(idl_loop);
>   
>       duration = time_msec() - start;
> +    *idl_duration = duration;
>       /* ovsdb_idl_run() is called at least 2 times.  Once directly and
>        * once in the ovsdb_idl_loop_run().  n > 2 means that we received
>        * data on at least 2 subsequent calls. */
>       if (n > 2 || duration > 100) {
> -        VLOG(duration > 500 ? VLL_INFO : VLL_DBG,
> +        VLOG(duration > IDL_LOOP_MAX_DURATION_MS ? VLL_INFO : VLL_DBG,
>                "%s IDL run: %d iterations in %lld ms", name, n + 1, duration);
>       }
> +
>       return txn;
>   }
>   
> @@ -868,7 +871,8 @@ main(int argc, char *argv[])
>       /* Main loop. */
>       exiting = false;
>   
> -    bool recompute = false;
> +    struct northd_engine_context eng_ctx = {};
> +
>       while (!exiting) {
>           update_ssl_config();
>           memory_run();
> @@ -894,26 +898,28 @@ main(int argc, char *argv[])
>                   ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
>               }
>   
> -            struct ovsdb_idl_txn *ovnnb_txn = run_idl_loop(&ovnnb_idl_loop,
> -                                                           "OVN_Northbound");
> +            struct ovsdb_idl_txn *ovnnb_txn =
> +                    run_idl_loop(&ovnnb_idl_loop, "OVN_Northbound",
> +                                 &eng_ctx.nb_idl_duration_ms);
>               unsigned int new_ovnnb_cond_seqno =
>                           ovsdb_idl_get_condition_seqno(ovnnb_idl_loop.idl);
>               if (new_ovnnb_cond_seqno != ovnnb_cond_seqno) {
>                   if (!new_ovnnb_cond_seqno) {
>                       VLOG_INFO("OVN NB IDL reconnected, force recompute.");
> -                    recompute = true;
> +                    eng_ctx.recompute = true;
>                   }
>                   ovnnb_cond_seqno = new_ovnnb_cond_seqno;
>               }
>   
> -            struct ovsdb_idl_txn *ovnsb_txn = run_idl_loop(&ovnsb_idl_loop,
> -                                                           "OVN_Southbound");
> +            struct ovsdb_idl_txn *ovnsb_txn =
> +                    run_idl_loop(&ovnsb_idl_loop, "OVN_Southbound",
> +                                 &eng_ctx.sb_idl_duration_ms);
>               unsigned int new_ovnsb_cond_seqno =
>                           ovsdb_idl_get_condition_seqno(ovnsb_idl_loop.idl);
>               if (new_ovnsb_cond_seqno != ovnsb_cond_seqno) {
>                   if (!new_ovnsb_cond_seqno) {
>                       VLOG_INFO("OVN SB IDL reconnected, force recompute.");
> -                    recompute = true;
> +                    eng_ctx.recompute = true;
>                   }
>                   ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>               }
> @@ -932,11 +938,12 @@ main(int argc, char *argv[])
>   
>               if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
>                   bool activity = false;
> -                if (ovnnb_txn && ovnsb_txn) {
> +                if (ovnnb_txn && ovnsb_txn &&
> +                    inc_proc_northd_can_run(&eng_ctx)) {
>                       int64_t loop_start_time = time_wall_msec();
>                       activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn,
> -                                                        recompute);
> -                    recompute = false;
> +                                                   &eng_ctx);
> +                    eng_ctx.recompute = false;
>                       check_and_add_supported_dhcp_opts_to_sb_db(
>                                    ovnsb_txn, ovnsb_idl_loop.idl);
>                       check_and_add_supported_dhcpv6_opts_to_sb_db(
> @@ -949,7 +956,7 @@ main(int argc, char *argv[])
>                                               ovnsb_idl_loop.idl,
>                                               ovnnb_txn, ovnsb_txn,
>                                               &ovnsb_idl_loop);
> -                } else if (!recompute) {
> +                } else if (!eng_ctx.recompute) {
>                       clear_idl_track = false;
>                   }
>   
> @@ -958,13 +965,13 @@ main(int argc, char *argv[])
>                   if (!ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop)) {
>                       VLOG_INFO("OVNNB commit failed, "
>                                 "force recompute next time.");
> -                    recompute = true;
> +                    eng_ctx.recompute = true;
>                   }
>   
>                   if (!ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop)) {
>                       VLOG_INFO("OVNSB commit failed, "
>                                 "force recompute next time.");
> -                    recompute = true;
> +                    eng_ctx.recompute = true;
>                   }
>                   run_memory_trimmer(ovnnb_idl_loop.idl, activity);
>               } else {
> @@ -973,7 +980,7 @@ main(int argc, char *argv[])
>                   ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>   
>                   /* Force a full recompute next time we become active. */
> -                recompute = true;
> +                eng_ctx.recompute = true;
>               }
>           } else {
>               /* ovn-northd is paused
> @@ -997,7 +1004,7 @@ main(int argc, char *argv[])
>               ovsdb_idl_wait(ovnsb_idl_loop.idl);
>   
>               /* Force a full recompute next time we become active. */
> -            recompute = true;
> +            eng_ctx.recompute = true;
>           }
>   
>           if (clear_idl_track) {
> @@ -1019,6 +1026,9 @@ main(int argc, char *argv[])
>           if (nb) {
>               interval = smap_get_int(&nb->options, "northd_probe_interval",
>                                       interval);
> +            eng_ctx.backoff_ms =
> +                    smap_get_uint(&nb->options, "northd-backoff-interval-ms",
> +                                  0);
>           }
>           set_idl_probe_interval(ovnnb_idl_loop.idl, ovnnb_db, interval);
>           set_idl_probe_interval(ovnsb_idl_loop.idl, ovnsb_db, interval);
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 4fbf4f7e5..bca280367 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -349,6 +349,15 @@
>           of HWOL compatibility with GDP.
>         </column>
>   
> +      <column name="options" key="northd-backoff-interval-ms">
> +        Maximum interval that the northd incremental engine is delayed by
> +        in milliseconds. Setting the value to nonzero delays the next northd
> +        engine run by the previous run time, capped by the specified value.
> +        If the value is zero the engine won't be delayed at all.
> +        The recommended period is smaller than 500 ms, beyond that the latency
> +        of SB changes would be very noticeable.
> +      </column>
> +
>         <group title="Options for configuring interconnection route advertisement">
>           <p>
>             These options control how routes are advertised between OVN
Dumitru Ceara Aug. 30, 2023, 11:03 a.m. UTC | #2
On 8/25/23 19:31, Mark Michelson wrote:
> Thanks for the changes Ales.
> 
> Acked-by: Mark Michelson <mmichels@redhat.com>
> 
> On 8/24/23 01:29, Ales Musil wrote:
>> Add config option called "northd-backoff-interval-ms" that allows
>> to delay northd engine runs capped by the config option.
>> When the config option is set to 0 or unspecified, the engine
>> will run without any restrictions. If the value >0 we will delay
>> northd engine run by the previous run time capped by the
>> config option.
>>
>> The reason to delay northd is to prevent it from consuming 100%
>> CPU all the time, secondary to that is the batch of NB updates
>> that could be processed in single northd run. With the recent
>> changes to I-P the engine processing updates faster, but not
>> quite fast enough for processing to be faster than NB changes,
>> which in turn can result into northd never going to sleep. With
>> the backoff period enabled northd can sleep and process the DB
>> updates in bigger batch.
>>
>> In addition to process the updates as fast as possible wake the
>> northd immediately if there are changes accumulated over period
>> of 500 ms or bigger.
>>
>> The results are very noticeable during scale testing.
>> Run without any backoff period:
>> northd aggregate CPU 9810% avg / 12765% max
>> northd was spinning at 100% CPU the entire second half of the test.
>>
>> Run with 200 ms max backoff period:
>> northd aggregate CPU 6066% avg / 7689% max
>> northd was around 60% for the second half of the test.
>>
>> One thing to note is that the overall latency was slightly
>> increased from P99 4s to P99 4.1s.
>>
>> Signed-off-by: Ales Musil <amusil@redhat.com>
>> ---

Thanks, Ales and Mark, I applied this to the main branch.

Regards,
Dumitru
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 93d4bedcd..0667e7f94 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,8 @@  Post v23.06.0
     Existing sessions might get re-hashed to a different ECMP path when
     OVN detects the algorithm support in the datapath during an upgrade
     or restart of ovn-controller.
+  - Add "northd-backoff-interval-ms" config option to delay northd engine
+    runs capped at the set value.
 
 OVN v23.06.0 - 01 Jun 2023
 --------------------------
diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
index d328deb22..0b759ae1d 100644
--- a/northd/inc-proc-northd.c
+++ b/northd/inc-proc-northd.c
@@ -295,16 +295,18 @@  void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
 /* Returns true if the incremental processing ended up updating nodes. */
 bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
                          struct ovsdb_idl_txn *ovnsb_txn,
-                         bool recompute) {
+                         struct northd_engine_context *ctx) {
     ovs_assert(ovnnb_txn && ovnsb_txn);
+
+    int64_t start = time_msec();
     engine_init_run();
 
     /* Force a full recompute if instructed to, for example, after a NB/SB
      * reconnect event.  However, make sure we don't overwrite an existing
      * force-recompute request if 'recompute' is false.
      */
-    if (recompute) {
-        engine_set_force_recompute(recompute);
+    if (ctx->recompute) {
+        engine_set_force_recompute(ctx->recompute);
     }
 
     struct engine_context eng_ctx = {
@@ -330,6 +332,12 @@  bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
     } else {
         engine_set_force_recompute(false);
     }
+
+    int64_t now = time_msec();
+    /* Postpone the next run by length of current run with maximum capped
+     * by "northd-backoff-interval-ms" interval. */
+    ctx->next_run_ms = now + MIN(now - start, ctx->backoff_ms);
+
     return engine_has_updated();
 }
 
@@ -339,6 +347,19 @@  void inc_proc_northd_cleanup(void)
     engine_set_context(NULL);
 }
 
+bool
+inc_proc_northd_can_run(struct northd_engine_context *ctx)
+{
+    if (ctx->recompute || time_msec() >= ctx->next_run_ms ||
+        ctx->nb_idl_duration_ms >= IDL_LOOP_MAX_DURATION_MS ||
+        ctx->sb_idl_duration_ms >= IDL_LOOP_MAX_DURATION_MS) {
+        return true;
+    }
+
+    poll_timer_wait_until(ctx->next_run_ms);
+    return false;
+}
+
 static void
 chassis_features_list(struct unixctl_conn *conn, int argc OVS_UNUSED,
                       const char *argv[] OVS_UNUSED, void *features_)
diff --git a/northd/inc-proc-northd.h b/northd/inc-proc-northd.h
index 9b81c7ee0..a2b9b7fdb 100644
--- a/northd/inc-proc-northd.h
+++ b/northd/inc-proc-northd.h
@@ -6,11 +6,22 @@ 
 #include "northd.h"
 #include "ovsdb-idl.h"
 
+#define IDL_LOOP_MAX_DURATION_MS 500
+
+struct northd_engine_context {
+    int64_t next_run_ms;
+    uint64_t nb_idl_duration_ms;
+    uint64_t sb_idl_duration_ms;
+    uint32_t backoff_ms;
+    bool recompute;
+};
+
 void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
                           struct ovsdb_idl_loop *sb);
 bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn,
                          struct ovsdb_idl_txn *ovnsb_txn,
-                         bool recompute);
+                         struct northd_engine_context *ctx);
 void inc_proc_northd_cleanup(void);
+bool inc_proc_northd_can_run(struct northd_engine_context *ctx);
 
 #endif /* INC_PROC_NORTHD */
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 4fa1b039e..ddb8c35e7 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -684,7 +684,8 @@  update_ssl_config(void)
 }
 
 static struct ovsdb_idl_txn *
-run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
+run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name,
+             uint64_t *idl_duration)
 {
     unsigned long long duration, start = time_msec();
     unsigned int seqno = UINT_MAX;
@@ -692,9 +693,9 @@  run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
     int n = 0;
 
     /* Accumulate database changes as long as there are some,
-     * but no longer than half a second. */
+     * but no longer than "IDL_LOOP_MAX_DURATION_MS". */
     while (seqno != ovsdb_idl_get_seqno(idl_loop->idl)
-           && time_msec() - start < 500) {
+           && time_msec() - start < IDL_LOOP_MAX_DURATION_MS) {
         seqno = ovsdb_idl_get_seqno(idl_loop->idl);
         ovsdb_idl_run(idl_loop->idl);
         n++;
@@ -703,13 +704,15 @@  run_idl_loop(struct ovsdb_idl_loop *idl_loop, const char *name)
     txn = ovsdb_idl_loop_run(idl_loop);
 
     duration = time_msec() - start;
+    *idl_duration = duration;
     /* ovsdb_idl_run() is called at least 2 times.  Once directly and
      * once in the ovsdb_idl_loop_run().  n > 2 means that we received
      * data on at least 2 subsequent calls. */
     if (n > 2 || duration > 100) {
-        VLOG(duration > 500 ? VLL_INFO : VLL_DBG,
+        VLOG(duration > IDL_LOOP_MAX_DURATION_MS ? VLL_INFO : VLL_DBG,
              "%s IDL run: %d iterations in %lld ms", name, n + 1, duration);
     }
+
     return txn;
 }
 
@@ -868,7 +871,8 @@  main(int argc, char *argv[])
     /* Main loop. */
     exiting = false;
 
-    bool recompute = false;
+    struct northd_engine_context eng_ctx = {};
+
     while (!exiting) {
         update_ssl_config();
         memory_run();
@@ -894,26 +898,28 @@  main(int argc, char *argv[])
                 ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
             }
 
-            struct ovsdb_idl_txn *ovnnb_txn = run_idl_loop(&ovnnb_idl_loop,
-                                                           "OVN_Northbound");
+            struct ovsdb_idl_txn *ovnnb_txn =
+                    run_idl_loop(&ovnnb_idl_loop, "OVN_Northbound",
+                                 &eng_ctx.nb_idl_duration_ms);
             unsigned int new_ovnnb_cond_seqno =
                         ovsdb_idl_get_condition_seqno(ovnnb_idl_loop.idl);
             if (new_ovnnb_cond_seqno != ovnnb_cond_seqno) {
                 if (!new_ovnnb_cond_seqno) {
                     VLOG_INFO("OVN NB IDL reconnected, force recompute.");
-                    recompute = true;
+                    eng_ctx.recompute = true;
                 }
                 ovnnb_cond_seqno = new_ovnnb_cond_seqno;
             }
 
-            struct ovsdb_idl_txn *ovnsb_txn = run_idl_loop(&ovnsb_idl_loop,
-                                                           "OVN_Southbound");
+            struct ovsdb_idl_txn *ovnsb_txn =
+                    run_idl_loop(&ovnsb_idl_loop, "OVN_Southbound",
+                                 &eng_ctx.sb_idl_duration_ms);
             unsigned int new_ovnsb_cond_seqno =
                         ovsdb_idl_get_condition_seqno(ovnsb_idl_loop.idl);
             if (new_ovnsb_cond_seqno != ovnsb_cond_seqno) {
                 if (!new_ovnsb_cond_seqno) {
                     VLOG_INFO("OVN SB IDL reconnected, force recompute.");
-                    recompute = true;
+                    eng_ctx.recompute = true;
                 }
                 ovnsb_cond_seqno = new_ovnsb_cond_seqno;
             }
@@ -932,11 +938,12 @@  main(int argc, char *argv[])
 
             if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
                 bool activity = false;
-                if (ovnnb_txn && ovnsb_txn) {
+                if (ovnnb_txn && ovnsb_txn &&
+                    inc_proc_northd_can_run(&eng_ctx)) {
                     int64_t loop_start_time = time_wall_msec();
                     activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn,
-                                                        recompute);
-                    recompute = false;
+                                                   &eng_ctx);
+                    eng_ctx.recompute = false;
                     check_and_add_supported_dhcp_opts_to_sb_db(
                                  ovnsb_txn, ovnsb_idl_loop.idl);
                     check_and_add_supported_dhcpv6_opts_to_sb_db(
@@ -949,7 +956,7 @@  main(int argc, char *argv[])
                                             ovnsb_idl_loop.idl,
                                             ovnnb_txn, ovnsb_txn,
                                             &ovnsb_idl_loop);
-                } else if (!recompute) {
+                } else if (!eng_ctx.recompute) {
                     clear_idl_track = false;
                 }
 
@@ -958,13 +965,13 @@  main(int argc, char *argv[])
                 if (!ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop)) {
                     VLOG_INFO("OVNNB commit failed, "
                               "force recompute next time.");
-                    recompute = true;
+                    eng_ctx.recompute = true;
                 }
 
                 if (!ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop)) {
                     VLOG_INFO("OVNSB commit failed, "
                               "force recompute next time.");
-                    recompute = true;
+                    eng_ctx.recompute = true;
                 }
                 run_memory_trimmer(ovnnb_idl_loop.idl, activity);
             } else {
@@ -973,7 +980,7 @@  main(int argc, char *argv[])
                 ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
 
                 /* Force a full recompute next time we become active. */
-                recompute = true;
+                eng_ctx.recompute = true;
             }
         } else {
             /* ovn-northd is paused
@@ -997,7 +1004,7 @@  main(int argc, char *argv[])
             ovsdb_idl_wait(ovnsb_idl_loop.idl);
 
             /* Force a full recompute next time we become active. */
-            recompute = true;
+            eng_ctx.recompute = true;
         }
 
         if (clear_idl_track) {
@@ -1019,6 +1026,9 @@  main(int argc, char *argv[])
         if (nb) {
             interval = smap_get_int(&nb->options, "northd_probe_interval",
                                     interval);
+            eng_ctx.backoff_ms =
+                    smap_get_uint(&nb->options, "northd-backoff-interval-ms",
+                                  0);
         }
         set_idl_probe_interval(ovnnb_idl_loop.idl, ovnnb_db, interval);
         set_idl_probe_interval(ovnsb_idl_loop.idl, ovnsb_db, interval);
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 4fbf4f7e5..bca280367 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -349,6 +349,15 @@ 
         of HWOL compatibility with GDP.
       </column>
 
+      <column name="options" key="northd-backoff-interval-ms">
+        Maximum interval that the northd incremental engine is delayed by
+        in milliseconds. Setting the value to nonzero delays the next northd
+        engine run by the previous run time, capped by the specified value.
+        If the value is zero the engine won't be delayed at all.
+        The recommended period is smaller than 500 ms, beyond that the latency
+        of SB changes would be very noticeable.
+      </column>
+
       <group title="Options for configuring interconnection route advertisement">
         <p>
           These options control how routes are advertised between OVN