Patchwork C-family stack check for threads

login
register
mail settings
Submitter Thomas Klein
Date March 20, 2011, 3:45 p.m.
Message ID <4D86210F.9020700@web.de>
Download mbox | patch
Permalink /patch/87656/
State New
Headers show

Comments

Thomas Klein - March 20, 2011, 3:45 p.m.
Hi

I would like to have a stack check for threads with small stack space 
for each thread.
(I'm using a ARM Cortex-M3 microcontroller with a stack size of a 1 
KByte per Thread.)
Each thread having its own limit address.
The thread scheduler can then calculate the limit and store this value 
inside of a global variable.
The compiler may generate code to check the stack for overflow at 
function entry.
In principal this can be done this way:
   - push registers as usual
   - figure out if one or two work registers, that can be used directly 
without extra push
   - if not enough registers found push required work registers to stack
   - load limit address into first working register
   - load value of limit address (into the same register)
   - if stack pointer will go to extend the stack (e.g. for local variables)
     load this size value too (here the second work register can be used)
   - compare for overflow
   - if overflow occur "call" stack_failure function
   - pop work registers that are pushed before
   - continue function prologue as usual e.g. extend stack pointer

The ARM target has an option "-mapcs-stack-check" but this is more or 
less not working. (implementaion missing)
There are also architecture independent options like
"-fstack-check=generic", "-fstack-limit-symbol=current_stack_limit" or 
"-fstack-limit-register=r6"
that can be used.

The generic stack check is doing a probe at end of function prologue phase
(e.g by writing 12K ahead the current stack pointer position).
If this stack space is not available the probe may generates a fault.
This require that the CPU is having a MPU or a MMU.
For machines with small memory space an additional mechanism should be
available.

The option "-fstack-check" can be extend by the switches "direct" and 
"indirect" to emit compare code in function prologue.
If switch "direct" is given the address of "-fstack-limit-symbol" 
represents the limit itself.
If switch "indirect" is given "-fstack-limit-symbol" is a kind of global
variable that needs be read before compare.

I have add an proposal to show how an integrateion of this behavior can
be done at ARM architecture.

The generated code look like this
e.g. if using "-fstack-check=indirect -fstack-limit-symbol=stack_limit_var"
->    push {r0}
->    ldr  r0, =stack_limit_var
->    ldr  r0, [r0]
->    cmp  sp, r0
->    bhs  1f
->    push {lr}
->    bl    __thumb_stack_failure    @ stack check
->.align
->.ltorg
->1:
->    pop    {r0}

Regards
   Thomas Klein

gcc/ChangeLog

2011-03-20  Thomas Klein <th.r.klein@web.de>
     * opts.c (common_handle_option): introduce new parameters "direct" and
     "indirect"
     * flag-types.h (enum stack_check_type): Likewise

     * explow.c (allocate_dynamic_stack_space):
     - suppress stack probing if parameter "direct", "indirect" or if a
     stack-limit is given
     - do additional read of limit value if parameter "indirect" and a
     stack-limit symbol is given
     - emit a call to a stack_failure function [as an alternative to a trap
     call]
     (function probe_stack_range): if allowed to override the range porbe
     emit generic_limit_check_stack

     * config/arm/arm.c (stack_check_output_function): new function to write
     the stack check code sequence to the assember file
     (stack_check_work_registers): new function to find possible working
     registers [only used by "stack check"]
     (arm_expand_prologue): stack check integration for ARM and Thumb-2
     (thumb1_output_function_prologue): stack check integration for Thumb-1

     * config/arm/arm.md (probe_stack): do not emit code when parameters
     "direct" or "indirect" given, emit move code as in gcc/explow.c
     [function emit_stack_probe]
     (probe_stack_done): dummy to make sure probe_stack insns are not
     optimized away
     (generic_limit_check_stack): if stack-limit and parameter "generic" is
     given use the limit the same way as in function
     allocate_dynamic_stack_space
     (stack_check): ARM/Thumb-2 insn to output function
     stack_check_output_function
     (stack_failure): failure call used in function
     allocate_dynamic_stack_space [similar to a trap but avoid conflict with
     builtin_trap]

  ;; being inserted into the upper 16 bits of the register.
  (define_insn "*arm_movtas_ze"
Thomas Klein - March 21, 2011, 12:16 p.m.
* Florian Weimer:
> * Thomas Klein:
>
>> e.g. if using "-fstack-check=indirect 
>> -fstack-limit-symbol=stack_limit_var"
> Have you looked at -fsplit-stack? It emits quite similar code.
>

Yes I have seen this.
But the switches -fstack-check and -fsplit-stack are for different cases.
The "stack check" (with a given limit) is used to to detect if a stack 
overflow has already taken place or if it will take place within the 
current function call.
While "split stack" is used to dynamically allocate more space (from 
Heap) if the current stack size is too small.
This mechanism is currently only supported for x86 machines (having a 
UNIX alike environment).

In my case I'm using a ARM microcontroller (without MMU) with 20KBytes 
of RAM in total.
I'm having 10 threads with a 1K stack for each.
Global variables allocating 8K and Heap having 2K.

I simply need a check method to detect if a function is going to write 
into the wrong stack area.

Regards
   Thomas

Patch

Index: gcc/opts.c
===================================================================
--- gcc/opts.c    (revision 171194)
+++ gcc/opts.c    (working copy)
@@ -1618,6 +1618,12 @@  common_handle_option (struct gcc_options *opts,
                 : STACK_CHECK_STATIC_BUILTIN
                   ? STATIC_BUILTIN_STACK_CHECK
                   : GENERIC_STACK_CHECK;
+      else if (!strcmp (arg, "indirect"))
+    /* This is an other stack checking method.  */
+    opts->x_flag_stack_check = INDIRECT_STACK_CHECK;
+      else if (!strcmp (arg, "direct"))
+    /* This is an other stack checking method.  */
+    opts->x_flag_stack_check = DIRECT_STACK_CHECK;
        else
      warning_at (loc, 0, "unknown stack check parameter \"%s\"", arg);
        break;
Index: gcc/flag-types.h
===================================================================
--- gcc/flag-types.h    (revision 171194)
+++ gcc/flag-types.h    (working copy)
@@ -153,7 +153,15 @@  enum stack_check_type

    /* Check the stack and entirely rely on the target configuration
       files, i.e. do not use the generic mechanism at all.  */
-  FULL_BUILTIN_STACK_CHECK
+  FULL_BUILTIN_STACK_CHECK,
+
+  /* Check the stack (if possible) before allocation of local variables at
+     each function entry. The stack limit is directly given e.g. by address
+     of a symbol */
+  DIRECT_STACK_CHECK,
+  /* Check the stack (if possible) before allocation of local variables at
+     each function entry. The stack limit is given by global variable. */
+  INDIRECT_STACK_CHECK
  };

  /* Names for the different levels of -Wstrict-overflow=N.  The numeric
Index: gcc/explow.c
===================================================================
--- gcc/explow.c    (revision 171194)
+++ gcc/explow.c    (working copy)
@@ -1365,7 +1365,12 @@  allocate_dynamic_stack_space (rtx size, unsigned s

    /* If needed, check that we have the required amount of stack.  Take 
into
       account what has already been checked.  */
-  if (STACK_CHECK_MOVING_SP)
+  if (  STACK_CHECK_MOVING_SP
+#ifdef HAVE_generic_limit_check_stack
+     || crtl->limit_stack
+#endif
+     || flag_stack_check == DIRECT_STACK_CHECK
+     || flag_stack_check == INDIRECT_STACK_CHECK)
      ;
    else if (flag_stack_check == GENERIC_STACK_CHECK)
      probe_stack_range (STACK_OLD_CHECK_PROTECT + 
STACK_CHECK_MAX_FRAME_SIZE,
@@ -1407,19 +1412,32 @@  allocate_dynamic_stack_space (rtx size, unsigned s
        /* Check stack bounds if necessary.  */
        if (crtl->limit_stack)
      {
+          rtx limit_rtx;
        rtx available;
        rtx space_available = gen_label_rtx ();
+          if (  GET_CODE (stack_limit_rtx) == SYMBOL_REF
+ && flag_stack_check == INDIRECT_STACK_CHECK)
+            limit_rtx = expand_unop (Pmode, mov_optab,
+                    gen_rtx_MEM (Pmode, stack_limit_rtx),
+                    NULL_RTX, 1);
+          else
+            limit_rtx = stack_limit_rtx;
  #ifdef STACK_GROWS_DOWNWARD
        available = expand_binop (Pmode, sub_optab,
-                    stack_pointer_rtx, stack_limit_rtx,
+                    stack_pointer_rtx, limit_rtx,
                      NULL_RTX, 1, OPTAB_WIDEN);
  #else
        available = expand_binop (Pmode, sub_optab,
-                    stack_limit_rtx, stack_pointer_rtx,
+                    limit_rtx, stack_pointer_rtx,
                      NULL_RTX, 1, OPTAB_WIDEN);
  #endif
        emit_cmp_and_jump_insns (available, size, GEU, NULL_RTX, Pmode, 1,
                     space_available);
+#ifdef HAVE_stack_failure
+      if (HAVE_stack_failure)
+        emit_insn (gen_stack_failure ());
+      else
+#endif
  #ifdef HAVE_trap
        if (HAVE_trap)
          emit_insn (gen_trap ());
@@ -1562,6 +1580,13 @@  probe_stack_range (HOST_WIDE_INT first, rtx size)
        emit_insn (gen_check_stack (addr));
      }
  #endif
+#ifdef HAVE_generic_limit_check_stack
+  else if (HAVE_generic_limit_check_stack)
+    {
+      rtx addr = memory_address (Pmode,stack_pointer_rtx);
+      emit_insn (gen_generic_limit_check_stack (addr));
+    }
+#endif

    /* Otherwise we have to generate explicit probes.  If we have a constant
       small number of them to generate, that's the easy case.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c    (revision 171194)
+++ gcc/config/arm/arm.c    (working copy)
@@ -14542,6 +14542,283 @@  arm_output_function_prologue (FILE *f, HOST_WIDE_I

  }

+/*
+ * Write prolouge part of stack check into asm file.
+ * For Thumb this may look like this:
+ *   push {rsym,ramn}
+ *   ldr rsym, .LSPCHK0
+ *   ldr rsym, [rsym]
+ *   ldr ramn, .LSPCHK0 + 4
+ *   add rsym, rsym, ramn
+ *   cmp sp, rsym
+ *   bhs .LSPCHK1
+ *   push {lr}
+ *   bl __thumb_stack_failure
+ * .align 2
+ * .LSPCHK0:
+ *   .word symbol_addr_of(stack_limit_rtx)
+ *   .word lenght_of(amount)
+ * .LSPCHK1:
+ *   pop {rsym,ramn}
+ */
+void
+stack_check_output_function (FILE *f, int reg0, int reg1, unsigned amount,
+                             unsigned numregs)
+{
+  unsigned amount_needsreg;
+  bool amount_const_ok, is_non_opt_thumb2, is_thumb2_hi_reg[2];
+  bool issym=false;
+  static unsigned spchk_labelno = 0;
+  char ok_lable_str[256];
+  char pool_lable_str[256];
+
+  if (TARGET_THUMB1)
+    amount_const_ok = (amount < 256);
+  else
+    amount_const_ok = const_ok_for_arm (amount);
+
+  if (GET_CODE (stack_limit_rtx) == SYMBOL_REF) /*stack_limit_rtx*/
+    {
+      issym = true;
+      amount_needsreg = !amount_const_ok;
+    }
+  else
+    amount_needsreg = (amount > 0);
+
+  is_non_opt_thumb2 = (TARGET_THUMB2 && !(optimize_size || optimize >= 2));
+  is_thumb2_hi_reg[0] = (TARGET_THUMB2 && reg0>7);
+  is_thumb2_hi_reg[1] = (TARGET_THUMB2 && reg1>7);
+
+  /*build labels for later use*/
+  if ( (issym && !(is_non_opt_thumb2 || is_thumb2_hi_reg[0]))
+     ||(amount && !amount_const_ok
+ && !((issym && is_thumb2_hi_reg[1])
+         || (!issym && is_thumb2_hi_reg[0])
+         || is_non_opt_thumb2)))
+    ASM_GENERATE_INTERNAL_LABEL (pool_lable_str, "LSPCHK", 
spchk_labelno++);
+  ASM_GENERATE_INTERNAL_LABEL (ok_lable_str, "LSPCHK", spchk_labelno++);
+
+  if (issym && amount) /*need temp regs for limit and amount*/
+    {
+      if (numregs >= 2)
+        ; /*have 2 regs => no need to push*/
+      else if (numregs == 1)
+        {
+          if (amount_needsreg)
+            {
+              /*have one reg but need two regs => push temp reg for 
amount*/
+              if (TARGET_ARM)
+                asm_fprintf (f, "\tstr\t%r, [%r, #-4]!\n", reg1, 
SP_REGNUM);
+              else
+                asm_fprintf (f, "\tpush\t{%r}\n", reg1);
+          /*due to additional push try to correct amount*/
+          if (amount >= 4)
+            {
+          if (amount_const_ok)
+            {
+              if (TARGET_THUMB1 || const_ok_for_arm(amount - 4))
+                amount -= 4;
+              /*on Thumb2 or ARM may not corrected; shouldn't hurt*/
+            }
+          else /*will be loaded from pool*/
+            amount -= 4;
+            }
+            }
+        }
+      else if (amount_needsreg)
+        {
+          /*have no reg but need two => push temp regs for limit and 
amount*/
+          if (TARGET_ARM)
+            asm_fprintf (f, "\tstmfd\t%r!, {%r,%r}\n", SP_REGNUM, reg0, 
reg1);
+          else
+            asm_fprintf (f, "\tpush\t{%r,%r}\n", reg0, reg1);
+          /*due to additional push try to correct amount*/
+          if (amount >= 8)
+            {
+              if (amount_const_ok)
+                {
+                  if (TARGET_THUMB1 || const_ok_for_arm(amount - 8))
+                    amount -= 8;
+                  /*on Thumb2 or ARM may not corrected; shouldn't hurt*/
+                }
+              else /*will be loaded from pool*/
+                amount -= 8;
+            }
+        }
+      else
+        {
+          /*have no reg but need one reg => push temp reg for limit*/
+          if (TARGET_ARM)
+            asm_fprintf (f, "\tstr\t%r, [%r, #-4]!\n", reg0, SP_REGNUM);
+          else
+            asm_fprintf (f, "\tpush\t{%r}\n", reg0);
+          /*due to additional push try to correct amount*/
+          if (amount >= 4)
+            {
+              if (amount_const_ok)
+                {
+                  if (TARGET_THUMB1 || const_ok_for_arm(amount - 4))
+                    amount -= 4;
+                  /*on Thumb2 or ARM may not corrected; shouldn't hurt*/
+                }
+              else /*will be loaded from pool*/
+                amount -= 4;
+            }
+        }
+    }
+  else if ((issym || amount_needsreg) && numregs == 0)
+    { /*push temp reg either for limit or amount*/
+      if (TARGET_ARM)
+        asm_fprintf (f, "\tstr\t%r, [%r, #-4]!\n", reg0, SP_REGNUM);
+      else
+        asm_fprintf (f, "\tpush\t{%r}\n", reg0);
+    }
+
+  if (issym)
+    {
+      if (is_non_opt_thumb2 || is_thumb2_hi_reg[0])
+        {
+          const char *str ;
+          str = (const char *) XSTR  (stack_limit_rtx, 0);
+          asm_fprintf (f, "\tmovw\t%r, #:lower16:%s\n", reg0, str);
+          asm_fprintf (f, "\tmovt\t%r, #:upper16:%s\n", reg0, str);
+        }
+      else
+        {
+          asm_fprintf (f, "\tldr\t%r, ", reg0);
+          assemble_name (f, pool_lable_str); /* =stack_limit_rtx */
+          fputs ("\n", f);
+        }
+
+      if (flag_stack_check == INDIRECT_STACK_CHECK)
+        asm_fprintf (f, "\tldr\t%r, [%r]\n", reg0, reg0);
+      if (amount)
+        {
+          if (amount_const_ok)
+            {
+              if (TARGET_32BIT)
+                asm_fprintf (f, "\tadds\t%r, %r, #%d\n", reg0, reg0, 
amount);
+              else
+                asm_fprintf (f, "\tadd\t%r, %r, #%d\n", reg0, reg0, 
amount);
+            }
+          else
+            {
+              if (is_non_opt_thumb2 || is_thumb2_hi_reg[1])
+                {
+                  asm_fprintf (f, "\tmovw\t%r, #0x%X\n", reg1, 
amount&0xFFFF);
+                  asm_fprintf (f, "\tmovt\t%r, #0x%X\n", reg1,
+                    (amount>>16)&0xFFFF);
+                }
+              else
+                {
+                  asm_fprintf (f, "\tldr\t%r, ", reg1);
+                  assemble_name (f, pool_lable_str); /* =amount */
+                  if (is_thumb2_hi_reg[0])
+                    fputs ("\n", f);
+                  else
+                    fputs (" + 4\n", f);
+                }
+              asm_fprintf (f, "\tadd\t%r, %r, %r\n", reg0, reg0, reg1);
+            }
+        }
+      asm_fprintf (f, "\tcmp\t%r, %r\n", SP_REGNUM, reg0);
+    }
+  else if (amount)
+    {
+      if (amount_const_ok)
+        asm_fprintf (f, "\tmov\t%r, #%d\n", reg0, amount);
+      else
+        {
+          if (is_non_opt_thumb2 || is_thumb2_hi_reg[0])
+            {
+              asm_fprintf (f, "\tmovw\t%r, #0x%X\n", reg0, amount&0xFFFF);
+              asm_fprintf (f, "\tmovt\t%r, #0x%X\n", 
reg0,(amount>>16)&0xFFFF);
+            }
+          else
+            {
+              asm_fprintf (f, "\tldr\t%r, ", reg0);
+              assemble_name (f, pool_lable_str); /* amount */
+              fputs ("\n", f);
+            }
+        }
+      asm_fprintf (f, "\tadd\t%r, %r, %r\n", 
reg0,reg0,REGNO(stack_limit_rtx));
+      asm_fprintf (f, "\tcmp\t%r, %r\n", SP_REGNUM, reg0);
+    }
+  else
+    asm_fprintf (f, "\tcmp\t%r, %r\n", SP_REGNUM, REGNO(stack_limit_rtx));
+  asm_fprintf (f, "\tbhs\t");
+  assemble_name (f, ok_lable_str);
+  fputs ("\n", f);
+
+  if (TARGET_ARM)
+    {
+      asm_fprintf (f, "\tstr\t%r, [%r, #-4]!\n", LR_REGNUM, SP_REGNUM);
+      asm_fprintf (f, "\tbl\t__arm_stack_failure\t%@ stack check\n");
+    }
+  else
+    {
+      asm_fprintf (f, "\tpush\t{%r}\n", LR_REGNUM);
+      asm_fprintf (f, "\tbl\t__thumb_stack_failure\t%@ stack check\n");
+    }
+
+    /*pool*/
+    if ( (issym && !(is_non_opt_thumb2 || is_thumb2_hi_reg[0]))
+       ||(amount && !amount_const_ok
+ && !(  (issym && is_thumb2_hi_reg[1])
+             || (!issym && is_thumb2_hi_reg[0])
+             || is_non_opt_thumb2)))
+    {
+      /*temp regs: collect values from here*/
+      if (!TARGET_ARM)
+        ASM_OUTPUT_ALIGN (f, 2);
+      ASM_OUTPUT_LABEL(f,pool_lable_str);
+      if (issym && !(is_non_opt_thumb2 || is_thumb2_hi_reg[0]))
+        assemble_aligned_integer (UNITS_PER_WORD, stack_limit_rtx);
+      if (amount && !amount_const_ok
+ && !(  (issym && is_thumb2_hi_reg[1])
+             || (!issym && is_thumb2_hi_reg[0])
+             || is_non_opt_thumb2))
+        assemble_aligned_integer (UNITS_PER_WORD, GEN_INT (amount));
+    }
+  ASM_OUTPUT_LABEL(f,ok_lable_str);
+  if (issym && amount) /*pop temp regs used by limit and amount*/
+    {
+      if (numregs >= 2)
+        ; /*no need to pop*/
+      else if (numregs == 1)
+        {
+          if (amount_needsreg)
+            {
+              if (TARGET_ARM)
+                asm_fprintf (f, "\tldr\t%r, [%r, #4]!\n", reg1, SP_REGNUM);
+              else
+                asm_fprintf (f, "\tpop\t{%r}\n", reg1);
+            }
+        }
+      else if (amount_needsreg)
+        {
+          if (TARGET_ARM)
+            asm_fprintf (f, "\tldmfd\t%r!, {%r,%r}\n", SP_REGNUM, reg0, 
reg1);
+          else
+            asm_fprintf (f, "\tpop\t{%r,%r}\n", reg0, reg1);
+        }
+      else
+        {
+          if (TARGET_ARM)
+            asm_fprintf (f, "\tldr\t%r, [%r, #4]!\n", reg0, SP_REGNUM);
+          else
+            asm_fprintf (f, "\tpop\t{%r}\n", reg0);
+        }
+    }
+  else if ((issym || amount_needsreg) && numregs == 0)
+    { /*pop temp reg used by limit or amount*/
+      if (TARGET_ARM)
+        asm_fprintf (f, "\tldr\t%r, [%r, #4]!\n", reg0, SP_REGNUM);
+      else
+        asm_fprintf (f, "\tpop\t{%r}\n", reg0);
+    }
+}
+
  const char *
  arm_output_epilogue (rtx sibling)
  {
@@ -15714,6 +15991,72 @@  thumb_set_frame_pointer (arm_stack_offsets *offset
    RTX_FRAME_RELATED_P (insn) = 1;
  }

+/*search for possible work registers for stack-check operation at prologue
+ return the number of register that can be used without extra push/pop */
+
+static int
+stack_check_work_registers (rtx *workreg)
+{
+  int reg, i, k, n, nregs;
+
+  if (crtl->args.info.pcs_variant <= ARM_PCS_AAPCS_LOCAL)
+    {
+      nregs = crtl->args.info.aapcs_next_ncrn;
+    }
+  else
+    nregs = crtl->args.info.nregs;
+
+
+  n = 0;
+  i = 0;
+  /* check if we can use one of the argument registers r0..r3 as long 
as they
+   * not holding data*/
+  for (reg = 0; reg <= LAST_ARG_REGNUM && i < 2; reg++)
+    {
+      if (  !df_regs_ever_live_p (reg)
+         || (cfun->machine->uses_anonymous_args && 
crtl->args.pretend_args_size
+ > (LAST_ARG_REGNUM - reg) * UNITS_PER_WORD)
+         || (!cfun->machine->uses_anonymous_args && nregs < reg + 1)
+         )
+        {
+      workreg[i++] = gen_rtx_REG (SImode, reg);
+      n = (reg + 1) % 4;
+        }
+    }
+
+  /* otherwise try to use r4..r7*/
+  for (reg = LAST_ARG_REGNUM + 1; reg <= LAST_LO_REGNUM && i < 2; reg++)
+    {
+      if (  df_regs_ever_live_p (reg)
+ && !fixed_regs[reg]
+ && reg != FP_REGNUM )
+        {
+      workreg[i++] = gen_rtx_REG (SImode, reg);
+        }
+    }
+
+  if (TARGET_32BIT)
+    {
+      /* ARM and Thumb-2 can use high regs.  */
+      for (reg = FIRST_HI_REGNUM; reg <= LAST_HI_REGNUM && i < 2; reg ++)
+        if (  df_regs_ever_live_p (reg)
+ && !fixed_regs[reg]
+ && reg != FP_REGNUM )
+          {
+        workreg[i++] = gen_rtx_REG (SImode, reg);
+          }
+    }
+
+  k = i;
+  /* if not enough found to be uses without extra push,
+   * collect next from r0..r4*/
+  for ( ; i<2; i++)
+    workreg[i] = gen_rtx_REG (SImode, n++);
+
+  return k;
+}
+
+
  /* Generate the prologue instructions for entry into an ARM or Thumb-2
     function.  */
  void
@@ -15963,6 +16306,24 @@  arm_expand_prologue (void)
      current_function_static_stack_size
        = offsets->outgoing_args - offsets->saved_args;

+  if (  crtl->limit_stack
+ && !(IS_INTERRUPT (func_type))
+ && (  flag_stack_check == DIRECT_STACK_CHECK
+        || flag_stack_check == INDIRECT_STACK_CHECK)
+ && (offsets->outgoing_args - offsets->saved_args) > 0
+     )
+    {
+      rtx reg[2], num_temp_regs;
+
+      amount = GEN_INT (offsets->outgoing_args - saved_regs
+            - offsets->saved_args);
+      num_temp_regs = GEN_INT (stack_check_work_registers(reg));
+      insn = gen_stack_check (stack_pointer_rtx,
+                              reg[0], reg[1], stack_limit_rtx,
+                              amount, num_temp_regs);
+      insn = emit_insn (insn);
+    }
+
    if (offsets->outgoing_args != offsets->saved_args + saved_regs)
      {
        /* This add can produce multiple insns for a large constant, so we
@@ -21148,6 +21509,26 @@  thumb1_output_function_prologue (FILE *f, HOST_WID
          thumb_pushpop (f, pushable_regs, 1, &cfa_offset, real_regs_mask);
      }
      }
+
+  if(  crtl->limit_stack
+ && (  flag_stack_check == DIRECT_STACK_CHECK
+       || flag_stack_check == INDIRECT_STACK_CHECK)
+ && (offsets->outgoing_args - offsets->saved_args)
+    )
+    {
+      unsigned amount, numregs;
+      int reg0, reg1;
+      rtx reg[2];
+
+      amount = offsets->outgoing_args - offsets->saved_regs;
+      amount -= 4 * thumb1_extra_regs_pushed (offsets, true);
+
+      numregs = stack_check_work_registers(reg);
+      reg0 = REGNO (reg[0]);
+      reg1 = REGNO (reg[1]);
+
+      stack_check_output_function  (f, reg0, reg1, amount, numregs);
+    }
  }

  /* Handle the case of a double word load into a low register from
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md    (revision 171194)
+++ gcc/config/arm/arm.md    (working copy)
@@ -104,6 +104,7 @@ 
     (UNSPEC_SYMBOL_OFFSET 27) ; The offset of the start of the symbol from
                               ; another symbolic address.
     (UNSPEC_MEMORY_BARRIER 28) ; Represent a memory barrier.
+   (UNSPEC_PROBE_STACK      29) ; probe stack memory reference
    ]
  )

@@ -10580,6 +10581,113 @@ 
    [(set_attr "conds" "clob")]
  )

+(define_expand "probe_stack"
+  [(match_operand 0 "memory_operand" "")]
+  "TARGET_EITHER"
+{
+  if (  flag_stack_check == DIRECT_STACK_CHECK
+     || flag_stack_check == INDIRECT_STACK_CHECK)
+    ;
+  else
+    {
+      emit_move_insn (operands[0], const0_rtx);
+      emit_insn (gen_probe_stack_done ());
+      emit_insn (gen_blockage ());
+    }
+  DONE;
+}
+)
+
+(define_insn "probe_stack_done"
+  [(unspec_volatile [(const_int 0)] UNSPEC_PROBE_STACK)]
+  "TARGET_EITHER"
+  {return \"@ probe stack done\";}
+  [(set_attr "type" "store1")
+   (set_attr "length" "0")]
+)
+
+(define_expand "generic_limit_check_stack"
+  [(match_operand 0 "memory_operand" "")]
+  "crtl->limit_stack
+ && flag_stack_check != DIRECT_STACK_CHECK
+ && flag_stack_check != INDIRECT_STACK_CHECK"
+{
+  rtx label = gen_label_rtx ();
+  rtx addr = copy_rtx (operands[0]);
+  addr = gen_rtx_fmt_ee (MINUS, Pmode, addr, GEN_INT (0));
+  addr = force_operand (addr, NULL_RTX);
+  emit_insn (gen_blockage ());
+  emit_cmp_and_jump_insns (stack_limit_rtx, addr, LEU, NULL_RTX, Pmode, 1,
+                           label);
+  emit_insn (gen_stack_failure ());
+  emit_label (label);
+  emit_insn (gen_blockage ());
+  DONE;
+}
+)
+
+(define_insn "stack_check"
+  [(set
+   (match_operand:SI 0 "register_operand" "=k")
+   (match_operand:SI 3 "general_operand"  "sr")
+   )
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "register_operand" "r")
+   (match_operand:SI 4 "general_operand"  "i")
+   (match_operand:SI 5 "general_operand"  "i")
+   (clobber (reg:CC CC_REGNUM))
+  ]
+  "TARGET_32BIT
+ && (operands[3] == stack_limit_rtx)
+ && (GET_CODE (operands[4]) == CONST_INT)
+ && (GET_CODE (operands[5]) == CONST_INT)"
+  "*
+  {
+    int reg0, reg1;
+    unsigned amount, numregs;
+    extern void stack_check_output_function (FILE *, int, int, unsigned,
+                                            unsigned);
+
+    reg0 = REGNO (operands[1]);
+    reg1 = REGNO (operands[2]);
+    amount = INTVAL (operands[4]);
+    numregs = INTVAL (operands[5]);
+
+    stack_check_output_function  (asm_out_file, reg0, reg1, amount, 
numregs);
+  }
+  return \"\";
+  "
+  [(set_attr "conds" "clob")
+   (set (attr "length")
+   (if_then_else (eq_attr "is_thumb" "yes")
+      (const_int 44)
+      (const_int 52)))]
+)
+
+(define_insn "stack_failure"
+  [(trap_if (const_int 1) (const_int 0))]
+  "TARGET_EITHER"
+  "*
+  {
+    rtx ops[2];
+
+    ops[0] = stack_pointer_rtx;
+    ops[1] = gen_rtx_REG (SImode, LR_REGNUM);
+    if (TARGET_ARM)
+      {
+        output_asm_insn (\"str\\t%1, [%0, #-4]!\", ops);
+        output_asm_insn (\"bl\\t__arm_stack_failure\\t%@ trap call\", ops);
+      }
+    else
+      {
+        output_asm_insn (\"push\\t{%1}\", ops);
+        output_asm_insn (\"bl\\t__thumb_stack_failure\\t%@ trap call\", 
ops);
+      }
+  }
+  return \"\";
+  "
+)
+
  ;; We only care about the lower 16 bits of the constant