diff mbox series

[ovs-dev,RFC] ofproto-dpif-xlate: Recirculate on stack exhaustion.

Message ID 20240223012704.2793017-1-i.maximets@ovn.org
State RFC
Delegated to: Simon Horman
Headers show
Series [ovs-dev,RFC] ofproto-dpif-xlate: Recirculate on stack exhaustion. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets Feb. 23, 2024, 1:26 a.m. UTC
This change attempts to track the actual available stack and force
recircuation if the flow translation logic is about to overflow it.

Unlike the recursion depth counter, this approach allows to track the
actual stack usage and bail early even if the recursion depth is not
reached yet.  This is important because different actions have vastly
different stack requirements and different systems have different
amount of stack allocated per thread by default.

Should work with both GCC and Clang, will likely not work on Windows.

The main thread is not treated fairly in this version.  At least on
Linux the main thread can grow its stack if the limit is dynamically
increased.  That is not normally true for other threads.  However
this patch sticks to initial stack size even for the main thread.

Sending as an RFC for discussion.  For v1 this change should likely
be split in about 4 patches:

1. Preserve resubmits counter across recirculations.  We might need
   that as a bug fix even, because it might be technically possible to
   bypass the resubmit limit by inducing recirculations in the current
   code, e.g. with conntrack.

2. Introduction of the stack frame functions.

3. Actual recirculation on stack exhaution.

4. Enabe tracing over this kind of recirculations.

Tracing also may need a refactor, it doesn't look particularly clean.
And we may want to track resubmits over packet-ins as well.

The behavioral change with this approach is that if we have a large
pipeline that eventually exceeds the number of resubmits, part of this
pipeline can be executed in the datapath then packet recirculates and
will be dropped afterwards.  In contrast, no actions will be executed
today in such scenario, but OVS would also crash with stack overflow,
so I'm not sure if that counts as a behavioral change. :)

Another one is tracking resubmits over recirculations.  That will
impact, for example, forked pipelines on conntrack, since resubmits
before the fork will be accounted for after the fork.

Didn't test much, only checked an infinite resubmit case:

  make -j8
  (ulimit -s 386; make sandbox)
  ovs-vsctl add-br br0
  ovs-vsctl add-port br0 p1
  ovs-ofclt del-flows br0
  (for i in $(seq 0 64); do
      j=$(expr $i + 1);
      echo "table=$i, actions=local,resubmit(,$j),local,resubmit(,$j),local";
   done;
   echo "table=65, actions=resubmit(,0)") > ./resubmits.txt
  ovs-ofclt add-flows br0 ./resubmits.txt
  ovs-appctl ofproto/trace br0 'in_port=1' > ./trace.txt

Depending on compiler flags it manages to get to 4096 resubmit limit
in anywhere from 5 to 16 recirculations.

Thoughts?

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 include/openvswitch/compiler.h | 12 +++++++++
 lib/ovs-thread.c               | 47 ++++++++++++++++++++++++++++++++++
 lib/ovs-thread.h               |  8 ++++++
 lib/sat-math.h                 |  5 +---
 ofproto/ofproto-dpif-rid.h     |  1 +
 ofproto/ofproto-dpif-trace.h   |  2 ++
 ofproto/ofproto-dpif-xlate.c   | 32 +++++++++++++++++++++++
 vswitchd/ovs-vswitchd.c        |  1 +
 8 files changed, 104 insertions(+), 4 deletions(-)

Comments

Eelco Chaudron March 22, 2024, 2:29 p.m. UTC | #1
On 23 Feb 2024, at 2:26, Ilya Maximets wrote:

> This change attempts to track the actual available stack and force
> recircuation if the flow translation logic is about to overflow it.

Thanks Ilya, for taking a stab at this!

> Unlike the recursion depth counter, this approach allows to track the
> actual stack usage and bail early even if the recursion depth is not
> reached yet.  This is important because different actions have vastly
> different stack requirements and different systems have different
> amount of stack allocated per thread by default.
>
> Should work with both GCC and Clang, will likely not work on Windows.
>
> The main thread is not treated fairly in this version.  At least on
> Linux the main thread can grow its stack if the limit is dynamically
> increased.  That is not normally true for other threads.  However
> this patch sticks to initial stack size even for the main thread.

Was going over the code first not reading the commit message, ignore my comment below ;)

> Sending as an RFC for discussion.  For v1 this change should likely
> be split in about 4 patches:
>
> 1. Preserve resubmits counter across recirculations.  We might need
>    that as a bug fix even, because it might be technically possible to
>    bypass the resubmit limit by inducing recirculations in the current
>    code, e.g. with conntrack.
>
> 2. Introduction of the stack frame functions.
>
> 3. Actual recirculation on stack exhaution.
>
> 4. Enabe tracing over this kind of recirculations.
>
> Tracing also may need a refactor, it doesn't look particularly clean.
> And we may want to track resubmits over packet-ins as well.
>
> The behavioral change with this approach is that if we have a large
> pipeline that eventually exceeds the number of resubmits, part of this
> pipeline can be executed in the datapath then packet recirculates and
> will be dropped afterwards.  In contrast, no actions will be executed
> today in such scenario, but OVS would also crash with stack overflow,
> so I'm not sure if that counts as a behavioral change. :)
>
> Another one is tracking resubmits over recirculations.  That will
> impact, for example, forked pipelines on conntrack, since resubmits
> before the fork will be accounted for after the fork.
>
> Didn't test much, only checked an infinite resubmit case:
>
>   make -j8
>   (ulimit -s 386; make sandbox)
>   ovs-vsctl add-br br0
>   ovs-vsctl add-port br0 p1
>   ovs-ofclt del-flows br0
>   (for i in $(seq 0 64); do
>       j=$(expr $i + 1);
>       echo "table=$i, actions=local,resubmit(,$j),local,resubmit(,$j),local";
>    done;
>    echo "table=65, actions=resubmit(,0)") > ./resubmits.txt
>   ovs-ofclt add-flows br0 ./resubmits.txt
>   ovs-appctl ofproto/trace br0 'in_port=1' > ./trace.txt
>
> Depending on compiler flags it manages to get to 4096 resubmit limit
> in anywhere from 5 to 16 recirculations.
>
> Thoughts?

I think the approach is good, and the code seems ok, although it may require some cleanup as you've mentioned. The whole xlate code puzzles me every time I look at it (it always seems to work differently than I thought it was ;)

CCing Mike for some input, as he has worked on stack optimizations in this area before.

Cheers,

Eelco

> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  include/openvswitch/compiler.h | 12 +++++++++
>  lib/ovs-thread.c               | 47 ++++++++++++++++++++++++++++++++++
>  lib/ovs-thread.h               |  8 ++++++
>  lib/sat-math.h                 |  5 +---
>  ofproto/ofproto-dpif-rid.h     |  1 +
>  ofproto/ofproto-dpif-trace.h   |  2 ++
>  ofproto/ofproto-dpif-xlate.c   | 32 +++++++++++++++++++++++
>  vswitchd/ovs-vswitchd.c        |  1 +
>  8 files changed, 104 insertions(+), 4 deletions(-)
>
> diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
> index 878c5c6a7..8da2e6eef 100644
> --- a/include/openvswitch/compiler.h
> +++ b/include/openvswitch/compiler.h
> @@ -26,6 +26,9 @@
>  #ifndef __has_extension
>    #define __has_extension(x) 0
>  #endif
> +#ifndef __has_builtin
> +  #define __has_builtin(x) 0
> +#endif
>
>  /* To make OVS_NO_RETURN portable across gcc/clang and MSVC, it should be
>   * added at the beginning of the function declaration. */
> @@ -300,5 +303,14 @@
>  #define BUILD_ASSERT_DECL_GCCONLY(EXPR) ((void) 0)
>  #endif
>
> +/* OVS_FRAME_ADDRESS can be used to get the address of the current stack frame.
> + * Note: Attempts to get address of any frame beside the current one (0) are
> + * dangerous and can lead to crashes according to GCC documentation.  */
> +#if __has_builtin(__builtin_frame_address)
> +#define OVS_FRAME_ADDRESS() ((char *) __builtin_frame_address(0))
> +#else
> +#define OVS_FRAME_ADDRESS() ((char *) 0)
> +#endif
> +
>
>  #endif /* compiler.h */
> diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> index f80008061..637ce27ec 100644
> --- a/lib/ovs-thread.c
> +++ b/lib/ovs-thread.c
> @@ -407,6 +407,52 @@ ovsthread_id_init(void)
>      return *ovsthread_id_get() = atomic_count_inc(&next_id);
>  }
>
> +DEFINE_STATIC_PER_THREAD_DATA(uintptr_t, ovsthread_stack_base, 0);
> +DEFINE_STATIC_PER_THREAD_DATA(size_t, ovsthread_stack_size, 0);
> +
> +void
> +ovsthread_stack_init(void)
> +{
> +    pthread_attr_t attr;
> +    size_t stacksize;
> +    int error;
> +
> +    ovs_assert(*ovsthread_stack_base_get() == 0);
> +    *ovsthread_stack_base_get() = (uintptr_t) OVS_FRAME_ADDRESS();
> +
> +    pthread_attr_init(&attr);
> +
> +    error = pthread_attr_getstacksize(&attr, &stacksize);
> +    if (error) {
> +        VLOG_ABORT("pthread_attr_getstacksize failed: %s",
> +                   ovs_strerror(error));
> +    }
> +    *ovsthread_stack_size_get() = stacksize;

The man pages discuss the main thread's ability to increase its stack size. However, for OVS, this detail might not be significant since the issue lies with the other threads.

> +    pthread_attr_destroy(&attr);
> +}
> +
> +bool
> +ovsthread_low_on_stack(void)
> +{
> +    uintptr_t curr = (uintptr_t) OVS_FRAME_ADDRESS();
> +    uintptr_t base = *ovsthread_stack_base_get();
> +    size_t size = *ovsthread_stack_size_get();
> +    size_t used;
> +
> +    if (!curr || !base || !size) {
> +        /* Either not initialized or not supported. */
> +        return false;
> +    }
> +
> +    used = (base > curr) ? base - curr : curr - base;
> +
> +    /* Consider 'low' as less than a 100 KB. */
> +    if (size < used + 100 * 1024) {
> +        return true;
> +    }
> +    return false;
> +}
> +
>  static void *
>  ovsthread_wrapper(void *aux_)
>  {
> @@ -415,6 +461,7 @@ ovsthread_wrapper(void *aux_)
>      unsigned int id;
>
>      id = ovsthread_id_init();
> +    ovsthread_stack_init();
>
>      aux = *auxp;
>      free(auxp);
> diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
> index aac5e19c9..e81f28970 100644
> --- a/lib/ovs-thread.h
> +++ b/lib/ovs-thread.h
> @@ -474,6 +474,14 @@ ovsthread_id_self(void)
>      return id;
>  }
>  
> +/* Enables use of ovsthread_low_on_stack().  Must be called by a new thread
> + * early after the thread creation. */
> +void ovsthread_stack_init(void);

Maybe add a note it’s already done by ovs_thread_create().

> +/* Returns 'true' is the current thread used up most of its stack space.
> + * Can be used, for example, to check if recursion has to be stopped. */
> +bool ovsthread_low_on_stack(void);
> +
>  /* Simulated global counter.
>   *
>   * Incrementing such a counter is meant to be cheaper than incrementing a
> diff --git a/lib/sat-math.h b/lib/sat-math.h
> index d66872387..34b939277 100644
> --- a/lib/sat-math.h
> +++ b/lib/sat-math.h
> @@ -18,12 +18,9 @@
>  #define SAT_MATH_H 1
>
>  #include <limits.h>
> +#include "openvswitch/compiler.h"
>  #include "openvswitch/util.h"
>
> -#ifndef __has_builtin
> -#define __has_builtin(x) 0
> -#endif
> -
>  /* Returns x + y, clamping out-of-range results into the range of the return
>   * type. */
>  static inline unsigned int
> diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h
> index 4df630c62..9488a3850 100644
> --- a/ofproto/ofproto-dpif-rid.h
> +++ b/ofproto/ofproto-dpif-rid.h
> @@ -151,6 +151,7 @@ struct frozen_state {
>      struct frozen_metadata metadata; /* Flow metadata. */
>      uint8_t *stack;               /* Stack if any. */
>      size_t stack_size;
> +    size_t resubmits;             /* Number of resubmits prior to freeze. */
>      mirror_mask_t mirrors;        /* Mirrors already output. */
>      bool conntracked;             /* Conntrack occurred prior to freeze. */
>      bool was_mpls;                /* MPLS packet */
> diff --git a/ofproto/ofproto-dpif-trace.h b/ofproto/ofproto-dpif-trace.h
> index f579a5ca4..1e34cdb95 100644
> --- a/ofproto/ofproto-dpif-trace.h
> +++ b/ofproto/ofproto-dpif-trace.h
> @@ -51,9 +51,11 @@ enum oftrace_node_type {
>
>  /* Reason why a flow is in a recirculation queue. */
>  enum oftrace_recirc_type {
> +    OFT_RECIRC_NONE = 0,
>      OFT_RECIRC_CONNTRACK,
>      OFT_RECIRC_MPLS,
>      OFT_RECIRC_BOND,
> +    OFT_RECIRC_STACK_EXHAUSTED,
>  };
>
>  /* A node within a trace. */
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 89f183182..e5e17b0a6 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -71,6 +71,7 @@
>  COVERAGE_DEFINE(xlate_actions);
>  COVERAGE_DEFINE(xlate_actions_oversize);
>  COVERAGE_DEFINE(xlate_actions_too_many_output);
> +COVERAGE_DEFINE(xlate_stack_exhausted);
>
>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_xlate);
>
> @@ -381,6 +382,7 @@ struct xlate_ctx {
>      * case at that point.
>      */
>      bool freezing;
> +    enum oftrace_recirc_type recirc_type; /* Recirculation type for tracing. */
>      bool recirc_update_dp_hash;    /* Generated recirculation will be preceded
>                                      * by datapath HASH action to get an updated
>                                      * dp_hash after recirculation. */
> @@ -495,6 +497,7 @@ ctx_cancel_freeze(struct xlate_ctx *ctx)
>  {
>      if (ctx->freezing) {
>          ctx->freezing = false;
> +        ctx->recirc_type = OFT_RECIRC_NONE;
>          ctx->recirc_update_dp_hash = false;
>          ofpbuf_clear(&ctx->frozen_actions);
>          ctx->frozen_actions.header = NULL;
> @@ -4607,6 +4610,12 @@ xlate_resubmit_resource_check(struct xlate_ctx *ctx)
>      } else if (ctx->stack.size >= 65536) {
>          xlate_report_error(ctx, "resubmits yielded over 64 kB of stack");
>          ctx->error = XLATE_STACK_TOO_DEEP;
> +    } else if (ovsthread_low_on_stack()) {
> +        xlate_report(ctx, OFT_WARN, "Thread stack exhausted, "
> +                     "recirculating to avoid overflow.");
> +        COVERAGE_INC(xlate_stack_exhausted);
> +        ctx->recirc_type = OFT_RECIRC_STACK_EXHAUSTED;
> +        ctx_trigger_freeze(ctx);
>      } else {
>          return true;
>      }
> @@ -5125,6 +5134,7 @@ xlate_controller_action(struct xlate_ctx *ctx, int len,
>          .ofproto_uuid = ctx->xbridge->ofproto->uuid,
>          .stack = ctx->stack.data,
>          .stack_size = ctx->stack.size,
> +        .resubmits = ctx->resubmits,
>          .mirrors = ctx->mirrors,
>          .conntracked = ctx->conntracked,
>          .was_mpls = ctx->was_mpls,
> @@ -5200,6 +5210,7 @@ finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
>          .ofproto_uuid = ctx->xbridge->ofproto->uuid,
>          .stack = ctx->stack.data,
>          .stack_size = ctx->stack.size,
> +        .resubmits = ctx->resubmits,
>          .mirrors = ctx->mirrors,
>          .conntracked = ctx->conntracked,
>          .was_mpls = ctx->was_mpls,
> @@ -5248,6 +5259,24 @@ finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
>              act_hash->hash_basis = ctx->dp_hash_basis;
>          }
>          nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, recirc_id);
> +
> +        if (OVS_UNLIKELY(ctx->xin->trace) && recirc_id &&
> +            ctx->recirc_type == OFT_RECIRC_STACK_EXHAUSTED) {
> +            if (oftrace_add_recirc_node(ctx->xin->recirc_queue,
> +                                        ctx->recirc_type,
> +                                        &ctx->xin->flow,
> +                                        ctx->ct_nat_action,
> +                                        ctx->xin->packet,
> +                                        recirc_id,
> +                                        ctx->xin->flow.ct_zone)) {
> +                xlate_report(ctx, OFT_DETAIL,
> +                             "Packet is sent for recirculation, "
> +                             "recric_id = 0x%"PRIx32".", recirc_id);
> +            } else {
> +                xlate_report(ctx, OFT_DETAIL, "Failed to trace recirculation "
> +                             "with recirc_id = 0x%"PRIx32".", recirc_id);
> +            }
> +        }
>      }
>
>      /* Undo changes done by freezing. */
> @@ -8062,6 +8091,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
>          .mirrors = 0,
>
>          .freezing = false,
> +        .recirc_type = OFT_RECIRC_NONE,
>          .recirc_update_dp_hash = false,
>          .frozen_actions = OFPBUF_STUB_INITIALIZER(frozen_actions_stub),
>          .pause = NULL,
> @@ -8139,6 +8169,8 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
>              clear_conntrack(&ctx);
>          }
>
> +        ctx.resubmits = state->resubmits;
> +
>          /* Restore pipeline metadata. May change flow's in_port and other
>           * metadata to the values that existed when freezing was triggered. */
>          frozen_metadata_to_flow(&ctx.xbridge->ofproto->up,
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index 273af9f5d..326e5b9f7 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -93,6 +93,7 @@ main(int argc, char *argv[])
>      fatal_ignore_sigpipe();
>
>      daemonize_start(true, hw_rawio_access);
> +    ovsthread_stack_init();
>
>      if (want_mlockall) {
>  #ifdef HAVE_MLOCKALL
> -- 
> 2.43.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index 878c5c6a7..8da2e6eef 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -26,6 +26,9 @@ 
 #ifndef __has_extension
   #define __has_extension(x) 0
 #endif
+#ifndef __has_builtin
+  #define __has_builtin(x) 0
+#endif
 
 /* To make OVS_NO_RETURN portable across gcc/clang and MSVC, it should be
  * added at the beginning of the function declaration. */
@@ -300,5 +303,14 @@ 
 #define BUILD_ASSERT_DECL_GCCONLY(EXPR) ((void) 0)
 #endif
 
+/* OVS_FRAME_ADDRESS can be used to get the address of the current stack frame.
+ * Note: Attempts to get address of any frame beside the current one (0) are
+ * dangerous and can lead to crashes according to GCC documentation.  */
+#if __has_builtin(__builtin_frame_address)
+#define OVS_FRAME_ADDRESS() ((char *) __builtin_frame_address(0))
+#else
+#define OVS_FRAME_ADDRESS() ((char *) 0)
+#endif
+
 
 #endif /* compiler.h */
diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index f80008061..637ce27ec 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -407,6 +407,52 @@  ovsthread_id_init(void)
     return *ovsthread_id_get() = atomic_count_inc(&next_id);
 }
 
+DEFINE_STATIC_PER_THREAD_DATA(uintptr_t, ovsthread_stack_base, 0);
+DEFINE_STATIC_PER_THREAD_DATA(size_t, ovsthread_stack_size, 0);
+
+void
+ovsthread_stack_init(void)
+{
+    pthread_attr_t attr;
+    size_t stacksize;
+    int error;
+
+    ovs_assert(*ovsthread_stack_base_get() == 0);
+    *ovsthread_stack_base_get() = (uintptr_t) OVS_FRAME_ADDRESS();
+
+    pthread_attr_init(&attr);
+
+    error = pthread_attr_getstacksize(&attr, &stacksize);
+    if (error) {
+        VLOG_ABORT("pthread_attr_getstacksize failed: %s",
+                   ovs_strerror(error));
+    }
+    *ovsthread_stack_size_get() = stacksize;
+    pthread_attr_destroy(&attr);
+}
+
+bool
+ovsthread_low_on_stack(void)
+{
+    uintptr_t curr = (uintptr_t) OVS_FRAME_ADDRESS();
+    uintptr_t base = *ovsthread_stack_base_get();
+    size_t size = *ovsthread_stack_size_get();
+    size_t used;
+
+    if (!curr || !base || !size) {
+        /* Either not initialized or not supported. */
+        return false;
+    }
+
+    used = (base > curr) ? base - curr : curr - base;
+
+    /* Consider 'low' as less than a 100 KB. */
+    if (size < used + 100 * 1024) {
+        return true;
+    }
+    return false;
+}
+
 static void *
 ovsthread_wrapper(void *aux_)
 {
@@ -415,6 +461,7 @@  ovsthread_wrapper(void *aux_)
     unsigned int id;
 
     id = ovsthread_id_init();
+    ovsthread_stack_init();
 
     aux = *auxp;
     free(auxp);
diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
index aac5e19c9..e81f28970 100644
--- a/lib/ovs-thread.h
+++ b/lib/ovs-thread.h
@@ -474,6 +474,14 @@  ovsthread_id_self(void)
     return id;
 }
 
+/* Enables use of ovsthread_low_on_stack().  Must be called by a new thread
+ * early after the thread creation. */
+void ovsthread_stack_init(void);
+
+/* Returns 'true' is the current thread used up most of its stack space.
+ * Can be used, for example, to check if recursion has to be stopped. */
+bool ovsthread_low_on_stack(void);
+
 /* Simulated global counter.
  *
  * Incrementing such a counter is meant to be cheaper than incrementing a
diff --git a/lib/sat-math.h b/lib/sat-math.h
index d66872387..34b939277 100644
--- a/lib/sat-math.h
+++ b/lib/sat-math.h
@@ -18,12 +18,9 @@ 
 #define SAT_MATH_H 1
 
 #include <limits.h>
+#include "openvswitch/compiler.h"
 #include "openvswitch/util.h"
 
-#ifndef __has_builtin
-#define __has_builtin(x) 0
-#endif
-
 /* Returns x + y, clamping out-of-range results into the range of the return
  * type. */
 static inline unsigned int
diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h
index 4df630c62..9488a3850 100644
--- a/ofproto/ofproto-dpif-rid.h
+++ b/ofproto/ofproto-dpif-rid.h
@@ -151,6 +151,7 @@  struct frozen_state {
     struct frozen_metadata metadata; /* Flow metadata. */
     uint8_t *stack;               /* Stack if any. */
     size_t stack_size;
+    size_t resubmits;             /* Number of resubmits prior to freeze. */
     mirror_mask_t mirrors;        /* Mirrors already output. */
     bool conntracked;             /* Conntrack occurred prior to freeze. */
     bool was_mpls;                /* MPLS packet */
diff --git a/ofproto/ofproto-dpif-trace.h b/ofproto/ofproto-dpif-trace.h
index f579a5ca4..1e34cdb95 100644
--- a/ofproto/ofproto-dpif-trace.h
+++ b/ofproto/ofproto-dpif-trace.h
@@ -51,9 +51,11 @@  enum oftrace_node_type {
 
 /* Reason why a flow is in a recirculation queue. */
 enum oftrace_recirc_type {
+    OFT_RECIRC_NONE = 0,
     OFT_RECIRC_CONNTRACK,
     OFT_RECIRC_MPLS,
     OFT_RECIRC_BOND,
+    OFT_RECIRC_STACK_EXHAUSTED,
 };
 
 /* A node within a trace. */
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 89f183182..e5e17b0a6 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -71,6 +71,7 @@ 
 COVERAGE_DEFINE(xlate_actions);
 COVERAGE_DEFINE(xlate_actions_oversize);
 COVERAGE_DEFINE(xlate_actions_too_many_output);
+COVERAGE_DEFINE(xlate_stack_exhausted);
 
 VLOG_DEFINE_THIS_MODULE(ofproto_dpif_xlate);
 
@@ -381,6 +382,7 @@  struct xlate_ctx {
     * case at that point.
     */
     bool freezing;
+    enum oftrace_recirc_type recirc_type; /* Recirculation type for tracing. */
     bool recirc_update_dp_hash;    /* Generated recirculation will be preceded
                                     * by datapath HASH action to get an updated
                                     * dp_hash after recirculation. */
@@ -495,6 +497,7 @@  ctx_cancel_freeze(struct xlate_ctx *ctx)
 {
     if (ctx->freezing) {
         ctx->freezing = false;
+        ctx->recirc_type = OFT_RECIRC_NONE;
         ctx->recirc_update_dp_hash = false;
         ofpbuf_clear(&ctx->frozen_actions);
         ctx->frozen_actions.header = NULL;
@@ -4607,6 +4610,12 @@  xlate_resubmit_resource_check(struct xlate_ctx *ctx)
     } else if (ctx->stack.size >= 65536) {
         xlate_report_error(ctx, "resubmits yielded over 64 kB of stack");
         ctx->error = XLATE_STACK_TOO_DEEP;
+    } else if (ovsthread_low_on_stack()) {
+        xlate_report(ctx, OFT_WARN, "Thread stack exhausted, "
+                     "recirculating to avoid overflow.");
+        COVERAGE_INC(xlate_stack_exhausted);
+        ctx->recirc_type = OFT_RECIRC_STACK_EXHAUSTED;
+        ctx_trigger_freeze(ctx);
     } else {
         return true;
     }
@@ -5125,6 +5134,7 @@  xlate_controller_action(struct xlate_ctx *ctx, int len,
         .ofproto_uuid = ctx->xbridge->ofproto->uuid,
         .stack = ctx->stack.data,
         .stack_size = ctx->stack.size,
+        .resubmits = ctx->resubmits,
         .mirrors = ctx->mirrors,
         .conntracked = ctx->conntracked,
         .was_mpls = ctx->was_mpls,
@@ -5200,6 +5210,7 @@  finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
         .ofproto_uuid = ctx->xbridge->ofproto->uuid,
         .stack = ctx->stack.data,
         .stack_size = ctx->stack.size,
+        .resubmits = ctx->resubmits,
         .mirrors = ctx->mirrors,
         .conntracked = ctx->conntracked,
         .was_mpls = ctx->was_mpls,
@@ -5248,6 +5259,24 @@  finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
             act_hash->hash_basis = ctx->dp_hash_basis;
         }
         nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, recirc_id);
+
+        if (OVS_UNLIKELY(ctx->xin->trace) && recirc_id &&
+            ctx->recirc_type == OFT_RECIRC_STACK_EXHAUSTED) {
+            if (oftrace_add_recirc_node(ctx->xin->recirc_queue,
+                                        ctx->recirc_type,
+                                        &ctx->xin->flow,
+                                        ctx->ct_nat_action,
+                                        ctx->xin->packet,
+                                        recirc_id,
+                                        ctx->xin->flow.ct_zone)) {
+                xlate_report(ctx, OFT_DETAIL,
+                             "Packet is sent for recirculation, "
+                             "recric_id = 0x%"PRIx32".", recirc_id);
+            } else {
+                xlate_report(ctx, OFT_DETAIL, "Failed to trace recirculation "
+                             "with recirc_id = 0x%"PRIx32".", recirc_id);
+            }
+        }
     }
 
     /* Undo changes done by freezing. */
@@ -8062,6 +8091,7 @@  xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
         .mirrors = 0,
 
         .freezing = false,
+        .recirc_type = OFT_RECIRC_NONE,
         .recirc_update_dp_hash = false,
         .frozen_actions = OFPBUF_STUB_INITIALIZER(frozen_actions_stub),
         .pause = NULL,
@@ -8139,6 +8169,8 @@  xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
             clear_conntrack(&ctx);
         }
 
+        ctx.resubmits = state->resubmits;
+
         /* Restore pipeline metadata. May change flow's in_port and other
          * metadata to the values that existed when freezing was triggered. */
         frozen_metadata_to_flow(&ctx.xbridge->ofproto->up,
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index 273af9f5d..326e5b9f7 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -93,6 +93,7 @@  main(int argc, char *argv[])
     fatal_ignore_sigpipe();
 
     daemonize_start(true, hw_rawio_access);
+    ovsthread_stack_init();
 
     if (want_mlockall) {
 #ifdef HAVE_MLOCKALL