diff mbox

[i386] : Change stack probing and allocation implementation

Message ID 4C9EC74C.4080009@redhat.com
State New
Headers show

Commit Message

Richard Henderson Sept. 26, 2010, 4:08 a.m. UTC
On 08/21/2010 08:23 AM, Kai Tietz wrote:
> 2010/8/19 Kai Tietz <ktietz70@googlemail.com>:
>> Hello,
>>
>> the behavior of i386's stack allocation with probing (via ___chkstk)
>> has some disadvantages. First the probing code is more costy on
>> execution then necessary - this hits mainly win32 targets where
>> stack-probing is active by default for stack-allocation >= 0x1000
>> bytes - and it is additionally incompatible to the variant of chkstk
>> in msvcrt.
>> Additionally this new version avoids some register clobbering for
>> 64-bit and it simplifies the prologue generation a bit.
>>
>> 2010-08-19  Kai Tietz
>>
>>        * config/i386/cygwin.asm (___chkstk_ms): New.
>>        * config/i386/i386.c (override_options): Replace
>>        gen_allocate_stack_worker_,, by gen_allocate_stack_worker_probe_,,.
>>        (ix86_expand_prologue): Adjust probed stack allocation.
>>        * config/i386/i386.md (define_insn "allocate_stack_worker_32): Removed.
>>        (define_insn "allocate_stack_worker_64): Removed.
>>        (define_insn "allocate_stack_worker_probe_32): New.
>>        (define_insn "allocate_stack_worker_probe_64): New.
>>        (allocate_stack): Adjust probed stack allocation.

I altered the patch a bit.  Tidied up the .md changes with macros,
tidied up the assembly file with multiple object files and dwarf2
unwind info.  Adjusted the prologue code to use the value in eax.

Tested on i686-cygwin, i686-linux, x86_64-linux, x86_63-mingw.
Committed.


r~
2010-09-25  Kai Tietz  <kai.tietz@onevision.com>
            Richard Henderson  <rth@redhat.com>

        * config/i386/cygwin.asm: Include auto-host.h.
        (cfi_startproc, cfi_endproc, cfi_adjust_cfa_offset,
        cfi_def_cfa_register, cfi_register, cfi_push, cfi_pop): New macros.
        (__chkstk, __alloca): Annotate for dwarf2 unwind info.  Drop
        alignment code from the 64-bit path.  Use gas local labels.
        * config/i386/i386.md (pro_epilogue_adjust_stack_<mode>_2): Macroize
        from _di_2.  Remove the useless constant integer argument.
        (pro_epilogue_adjust_stack_<mode>_3): New.
        (allocate_stack_worker_probe_<mode>): Macroize from
        allocate_stack_worker_{32,64}.  Use __chkstk_ms.  Update all users.
        * config/i386/i386.c (ix86_expand_prologue): Use __chkstk_ms;
        use gen_pro_epilogue_adjust_stack_*_3 and annotate it.
        (__chkstk_ms): New function.
        * config/i386/t-cygming (LIB1ASMFUNCS): Add _chkstk_ms.
        * gcc/config/i386/t-interix: Likewise.
        * configure.ac (HAVE_GAS_CFI_DIRECTIVE): Export for target.
        (HAVE_GAS_CFI_PERSONALITY_DIRECTIVE): Likewise.
        (HAVE_GAS_CFI_SECTIONS_DIRECTIVE): Likewise.
        * configure, config.in: Rebuild.

Comments

H.J. Lu Oct. 20, 2010, 7:11 p.m. UTC | #1
On Sat, Sep 25, 2010 at 9:08 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/21/2010 08:23 AM, Kai Tietz wrote:
>> 2010/8/19 Kai Tietz <ktietz70@googlemail.com>:
>>> Hello,
>>>
>>> the behavior of i386's stack allocation with probing (via ___chkstk)
>>> has some disadvantages. First the probing code is more costy on
>>> execution then necessary - this hits mainly win32 targets where
>>> stack-probing is active by default for stack-allocation >= 0x1000
>>> bytes - and it is additionally incompatible to the variant of chkstk
>>> in msvcrt.
>>> Additionally this new version avoids some register clobbering for
>>> 64-bit and it simplifies the prologue generation a bit.
>>>
>>> 2010-08-19  Kai Tietz
>>>
>>>        * config/i386/cygwin.asm (___chkstk_ms): New.
>>>        * config/i386/i386.c (override_options): Replace
>>>        gen_allocate_stack_worker_,, by gen_allocate_stack_worker_probe_,,.
>>>        (ix86_expand_prologue): Adjust probed stack allocation.
>>>        * config/i386/i386.md (define_insn "allocate_stack_worker_32): Removed.
>>>        (define_insn "allocate_stack_worker_64): Removed.
>>>        (define_insn "allocate_stack_worker_probe_32): New.
>>>        (define_insn "allocate_stack_worker_probe_64): New.
>>>        (allocate_stack): Adjust probed stack allocation.
>
> I altered the patch a bit.  Tidied up the .md changes with macros,
> tidied up the assembly file with multiple object files and dwarf2
> unwind info.  Adjusted the prologue code to use the value in eax.
>
> Tested on i686-cygwin, i686-linux, x86_64-linux, x86_63-mingw.
> Committed.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46095
diff mbox

Patch

diff --git a/gcc/config.in b/gcc/config.in
index a03b653..574c033 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -936,22 +936,13 @@ 
 
 
 /* Define 0/1 if your assembler supports CFI directives. */
-#ifndef USED_FOR_TARGET
 #undef HAVE_GAS_CFI_DIRECTIVE
-#endif
-
 
 /* Define 0/1 if your assembler supports .cfi_personality. */
-#ifndef USED_FOR_TARGET
 #undef HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
-#endif
-
 
 /* Define 0/1 if your assembler supports .cfi_sections. */
-#ifndef USED_FOR_TARGET
 #undef HAVE_GAS_CFI_SECTIONS_DIRECTIVE
-#endif
-
 
 /* Define if your assembler supports the .loc discriminator sub-directive. */
 #ifndef USED_FOR_TARGET
diff --git a/gcc/config/i386/cygwin.asm b/gcc/config/i386/cygwin.asm
index 588c12e..a6cc94d 100644
--- a/gcc/config/i386/cygwin.asm
+++ b/gcc/config/i386/cygwin.asm
@@ -1,6 +1,7 @@ 
 /* stuff needed for libgcc on win32.
  *
- *   Copyright (C) 1996, 1998, 2001, 2003, 2008, 2009 Free Software Foundation, Inc.
+ *   Copyright (C) 1996, 1998, 2001, 2003, 2008, 2009
+ *   Free Software Foundation, Inc.
  *   Written By Steve Chamberlain
  * 
  * This file is free software; you can redistribute it and/or modify it
@@ -23,104 +24,165 @@ 
  * <http://www.gnu.org/licenses/>.
  */
 
-#ifdef L_chkstk
+#include "auto-host.h"
+
+#ifdef HAVE_GAS_CFI_SECTIONS_DIRECTIVE
+	.cfi_sections	.debug_frame
+# define cfi_startproc()		.cfi_startproc
+# define cfi_endproc()			.cfi_endproc
+# define cfi_adjust_cfa_offset(X) 	.cfi_adjust_cfa_offset X
+# define cfi_def_cfa_register(X)	.cfi_def_cfa_register X
+# define cfi_register(D,S)		.cfi_register D, S
+# ifdef _WIN64
+#  define cfi_push(X)		.cfi_adjust_cfa_offset 8; .cfi_rel_offset X, 0
+#  define cfi_pop(X)		.cfi_adjust_cfa_offset -8; .cfi_restore X
+# else
+#  define cfi_push(X)		.cfi_adjust_cfa_offset 4; .cfi_rel_offset X, 0
+#  define cfi_pop(X)		.cfi_adjust_cfa_offset -4; .cfi_restore X
+# endif
+#else
+# define cfi_startproc()
+# define cfi_endproc()
+# define cfi_adjust_cfa_offset(X)
+# define cfi_def_cfa_register(X)
+# define cfi_register(D,S)
+# define cfi_push(X)
+# define cfi_pop(X)
+#endif /* HAVE_GAS_CFI_SECTIONS_DIRECTIVE */
 
-/* Function prologue calls _alloca to probe the stack when allocating more
+#ifdef L_chkstk
+/* Function prologue calls __chkstk to probe the stack when allocating more
    than CHECK_STACK_LIMIT bytes in one go.  Touching the stack at 4K
    increments is necessary to ensure that the guard pages used
    by the OS virtual memory manger are allocated in correct sequence.  */
 
 	.global ___chkstk
 	.global	__alloca
-#ifndef _WIN64
-___chkstk:
+#ifdef _WIN64
+/* __alloca is a normal function call, which uses %rcx as the argument.  */
+	cfi_startproc()
 __alloca:
-	pushl	%ecx		/* save temp */
-	leal	8(%esp), %ecx	/* point past return addr */
-	cmpl	$0x1000, %eax	/* > 4k ?*/
-	jb	Ldone
+	movq	%rcx, %rax
+	/* FALLTHRU */
 
-Lprobe:
-	subl	$0x1000, %ecx  		/* yes, move pointer down 4k*/
-	orl	$0x0, (%ecx)   		/* probe there */
-	subl	$0x1000, %eax  	 	/* decrement count */
-	cmpl	$0x1000, %eax
-	ja	Lprobe         	 	/* and do it again */
+/* ___chkstk is a *special* function call, which uses %rax as the argument.
+   We avoid clobbering the 4 integer argument registers, %rcx, %rdx, 
+   %r8 and %r9, which leaves us with %rax, %r10, and %r11 to use.  */
+	.align	4
+___chkstk:
+	popq	%r11			/* pop return address */
+	cfi_adjust_cfa_offset(-8)	/* indicate return address in r11 */
+	cfi_register(%rip, %r11)
+	movq	%rsp, %r10
+	cmpq	$0x1000, %rax		/* > 4k ?*/
+	jb	2f
 
-Ldone:
-	subl	%eax, %ecx	   
-	orl	$0x0, (%ecx)	/* less than 4k, just peek here */
+1:	subq	$0x1000, %r10  		/* yes, move pointer down 4k*/
+	orl	$0x0, (%r10)   		/* probe there */
+	subq	$0x1000, %rax  	 	/* decrement count */
+	cmpq	$0x1000, %rax
+	ja	1b			/* and do it again */
 
-	movl	%esp, %eax	/* save old stack pointer */
-	movl	%ecx, %esp	/* decrement stack */
-	movl	(%eax), %ecx	/* recover saved temp */
-	movl	4(%eax), %eax	/* recover return address */
+2:	subq	%rax, %r10
+	movq	%rsp, %rax		/* hold CFA until return */
+	cfi_def_cfa_register(%rax)
+	orl	$0x0, (%r10)		/* less than 4k, just peek here */
+	movq	%r10, %rsp		/* decrement stack */
 
 	/* Push the return value back.  Doing this instead of just
-	   jumping to %eax preserves the cached call-return stack
+	   jumping to %r11 preserves the cached call-return stack
 	   used by most modern processors.  */
-	pushl	%eax
+	pushq	%r11
 	ret
+	cfi_endproc()
 #else
-/* __alloca is a normal function call, which uses %rcx as the argument.  And stack space
-   for the argument is saved.  */
+	cfi_startproc()
+___chkstk:
 __alloca:
- 	movq	%rcx, %rax
-	addq	$0x7, %rax
-	andq	$0xfffffffffffffff8, %rax
-	popq	%rcx		/* pop return address */
-	popq	%r10		/* Pop the reserved stack space.  */
-	movq	%rsp, %r10	/* get sp */
-	cmpq	$0x1000, %rax	/* > 4k ?*/
-	jb	Ldone_alloca
-
-Lprobe_alloca:
-	subq	$0x1000, %r10  		/* yes, move pointer down 4k*/
-	orq	$0x0, (%r10)   		/* probe there */
-	subq	$0x1000, %rax  	 	/* decrement count */
-	cmpq	$0x1000, %rax
-	ja	Lprobe_alloca         	 	/* and do it again */
+	pushl	%ecx			/* save temp */
+	cfi_push(%eax)
+	leal	8(%esp), %ecx		/* point past return addr */
+	cmpl	$0x1000, %eax		/* > 4k ?*/
+	jb	2f
+
+1:	subl	$0x1000, %ecx  		/* yes, move pointer down 4k*/
+	orl	$0x0, (%ecx)   		/* probe there */
+	subl	$0x1000, %eax  	 	/* decrement count */
+	cmpl	$0x1000, %eax
+	ja	1b			/* and do it again */
 
-Ldone_alloca:
-	subq	%rax, %r10
-	orq	$0x0, (%r10)	/* less than 4k, just peek here */
-	movq	%r10, %rax
-	subq	$0x8, %r10	/* Reserve argument stack space.  */
-	movq	%r10, %rsp	/* decrement stack */
+2:	subl	%eax, %ecx	   
+	orl	$0x0, (%ecx)		/* less than 4k, just peek here */
+	movl	%esp, %eax		/* save current stack pointer */
+	cfi_def_cfa_register(%eax)
+	movl	%ecx, %esp		/* decrement stack */
+	movl	(%eax), %ecx		/* recover saved temp */
 
-	/* Push the return value back.  Doing this instead of just
-	   jumping to %rcx preserves the cached call-return stack
-	   used by most modern processors.  */
-	pushq	%rcx
+	/* Copy the return register.  Doing this instead of just jumping to
+	   the address preserves the cached call-return stack used by most
+	   modern processors.  */
+	pushl	4(%eax)
 	ret
+	cfi_endproc()
+#endif /* _WIN64 */
+#endif /* L_chkstk */
 
-/* ___chkstk is a *special* function call, which uses %rax as the argument.
-   We avoid clobbering the 4 integer argument registers, %rcx, %rdx, 
-   %r8 and %r9, which leaves us with %rax, %r10, and %r11 to use.  */
-___chkstk:
-	addq	$0x7, %rax	/* Make sure stack is on alignment of 8.  */
-	andq	$0xfffffffffffffff8, %rax
-	popq	%r11		/* pop return address */
-	movq	%rsp, %r10	/* get sp */
-	cmpq	$0x1000, %rax	/* > 4k ?*/
-	jb	Ldone
-
-Lprobe:
-	subq	$0x1000, %r10  		/* yes, move pointer down 4k*/
-	orl	$0x0, (%r10)   		/* probe there */
+#ifdef L_chkstk_ms
+/* ___chkstk_ms is a *special* function call, which uses %rax as the argument.
+   We avoid clobbering any registers.  Unlike ___chkstk, it just probes the
+   stack and does no stack allocation.  */
+	.global ___chkstk_ms
+#ifdef _WIN64
+	cfi_startproc()
+___chkstk_ms:
+	pushq	%rcx			/* save temps */
+	cfi_push(%rcx)
+	pushq	%rax
+	cfi_push(%rax)
+	cmpq	$0x1000, %rax		/* > 4k ?*/
+	leaq	24(%rsp), %rcx		/* point past return addr */
+	jb	2f
+
+1:	subq	$0x1000, %rcx  		/* yes, move pointer down 4k */
+	orq	$0x0, (%rcx)   		/* probe there */
 	subq	$0x1000, %rax  	 	/* decrement count */
 	cmpq	$0x1000, %rax
-	ja	Lprobe         	 	/* and do it again */
+	ja	1b			/* and do it again */
 
-Ldone:
-	subq	%rax, %r10
-	orl	$0x0, (%r10)	/* less than 4k, just peek here */
-	movq	%r10, %rsp	/* decrement stack */
+2:	subq	%rax, %rcx
+	orq	$0x0, (%rcx)		/* less than 4k, just peek here */
 
-	/* Push the return value back.  Doing this instead of just
-	   jumping to %r11 preserves the cached call-return stack
-	   used by most modern processors.  */
-	pushq	%r11
+	popq	%rax
+	cfi_pop(%rax)
+	popq	%rcx
+	cfi_pop(%rcx)
+	ret
+	cfi_endproc()
+#else
+	cfi_startproc()
+___chkstk_ms:
+	pushl	%ecx			/* save temp */
+	cfi_push(%ecx)
+	pushl	%eax
+	cfi_push(%eax)
+	cmpl	$0x1000, %eax		/* > 4k ?*/
+	leal	12(%esp), %ecx		/* point past return addr */
+	jb	2f
+
+1:	subl	$0x1000, %ecx  		/* yes, move pointer down 4k*/
+	orl	$0x0, (%ecx)   		/* probe there */
+	subl	$0x1000, %eax  	 	/* decrement count */
+	cmpl	$0x1000, %eax
+	ja	1b			/* and do it again */
+
+2:	subl	%eax, %ecx
+	orl	$0x0, (%ecx)		/* less than 4k, just peek here */
+
+	popl	%eax
+	cfi_pop(%eax)
+	popl	%ecx
+	cfi_pop(%ecx)
 	ret
-#endif
-#endif
+	cfi_endproc()
+#endif /* _WIN64 */
+#endif /* L_chkstk_ms */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eb7f65f..788ea4e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3661,7 +3661,7 @@  ix86_option_override_internal (bool main_args_p)
       ix86_gen_one_cmpl2 = gen_one_cmpldi2;
       ix86_gen_monitor = gen_sse3_monitor64;
       ix86_gen_andsp = gen_anddi3;
-      ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_64;
+      ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_probe_di;
       ix86_gen_adjust_stack_and_probe = gen_adjust_stack_and_probedi;
       ix86_gen_probe_stack_range = gen_probe_stack_rangedi;
     }
@@ -3674,7 +3674,7 @@  ix86_option_override_internal (bool main_args_p)
       ix86_gen_one_cmpl2 = gen_one_cmplsi2;
       ix86_gen_monitor = gen_sse3_monitor;
       ix86_gen_andsp = gen_andsi3;
-      ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_32;
+      ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_probe_si;
       ix86_gen_adjust_stack_and_probe = gen_adjust_stack_and_probesi;
       ix86_gen_probe_stack_range = gen_probe_stack_rangesi;
     }
@@ -8796,8 +8796,7 @@  pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
       insn = emit_insn (gen_rtx_SET (DImode, tmp, offset));
       if (style < 0)
 	RTX_FRAME_RELATED_P (insn) = 1;
-      insn = emit_insn (gen_pro_epilogue_adjust_stack_di_2 (dest, src, tmp,
-							    offset));
+      insn = emit_insn (gen_pro_epilogue_adjust_stack_di_2 (dest, src, tmp));
     }
 
   if (style >= 0)
@@ -9720,16 +9719,26 @@  ix86_expand_prologue (void)
 	}
 
       emit_move_insn (eax, GEN_INT (allocate));
+      emit_insn (ix86_gen_allocate_stack_worker (eax, eax));
 
-      insn = emit_insn (ix86_gen_allocate_stack_worker (eax, eax));
+      /* Use the fact that AX still contains ALLOCATE.  */
+      if (TARGET_64BIT)
+	insn = gen_pro_epilogue_adjust_stack_di_3 (stack_pointer_rtx,
+					           stack_pointer_rtx, eax);
+      else
+	insn = gen_pro_epilogue_adjust_stack_si_3 (stack_pointer_rtx,
+					           stack_pointer_rtx, eax);
+      insn = emit_insn (insn);
 
       if (m->fs.cfa_reg == stack_pointer_rtx)
 	{
 	  m->fs.cfa_offset += allocate;
-	  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (-allocate));
-	  t = gen_rtx_SET (VOIDmode, stack_pointer_rtx, t);
-	  add_reg_note (insn, REG_CFA_ADJUST_CFA, t);
+
 	  RTX_FRAME_RELATED_P (insn) = 1;
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA,
+			gen_rtx_SET (VOIDmode, stack_pointer_rtx,
+				     plus_constant (stack_pointer_rtx,
+						    -allocate)));
 	}
       m->fs.sp_offset += allocate;
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index adf528f..fddacd5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16242,52 +16242,35 @@ 
 	      (const_string "*")))
    (set_attr "mode" "<MODE>")])
 
-(define_insn "pro_epilogue_adjust_stack_di_2"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
-	(plus:DI (match_operand:DI 1 "register_operand" "0,r")
-		 (match_operand:DI 3 "immediate_operand" "i,i")))
-   (use (match_operand:DI 2 "register_operand" "r,l"))
+(define_insn "pro_epilogue_adjust_stack_<mode>_2"
+  [(set (match_operand:P 0 "register_operand" "=r")
+	(plus:P (match_operand:DI 1 "register_operand" "0")
+		 (match_operand:DI 2 "register_operand" "r")))
    (clobber (reg:CC FLAGS_REG))
    (clobber (mem:BLK (scratch)))]
-  "TARGET_64BIT"
-{
-  switch (get_attr_type (insn))
-    {
-    case TYPE_ALU:
-      return "add{q}\t{%2, %0|%0, %2}";
-
-    case TYPE_LEA:
-      operands[2] = gen_rtx_PLUS (DImode, operands[1], operands[2]);
-      return "lea{q}\t{%a2, %0|%0, %a2}";
-
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "type" "alu,lea")
-   (set_attr "mode" "DI")])
+  ""
+  "add{<imodesuffix>}\t{%2, %0|%0, %2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "<MODE>")])
 
-(define_insn "allocate_stack_worker_32"
-  [(set (match_operand:SI 0 "register_operand" "=a")
-	(unspec_volatile:SI [(match_operand:SI 1 "register_operand" "0")]
-			    UNSPECV_STACK_PROBE))
-   (set (reg:SI SP_REG) (minus:SI (reg:SI SP_REG) (match_dup 1)))
-   (clobber (reg:CC FLAGS_REG))]
-  "!TARGET_64BIT && ix86_target_stack_probe ()"
-  "call\t___chkstk"
-  [(set_attr "type" "multi")
-   (set_attr "length" "5")])
+(define_insn "pro_epilogue_adjust_stack_<mode>_3"
+  [(set (match_operand:P 0 "register_operand" "=r")
+	(minus:P (match_operand:P 1 "register_operand" "0")
+		 (match_operand:P 2 "register_operand" "r")))
+   (clobber (reg:CC FLAGS_REG))
+   (clobber (mem:BLK (scratch)))]
+  ""
+  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "<MODE>")])
 
-(define_insn "allocate_stack_worker_64"
-  [(set (match_operand:DI 0 "register_operand" "=a")
-	(unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")]
+(define_insn "allocate_stack_worker_probe_<mode>"
+  [(set (match_operand:P 0 "register_operand" "=a")
+	(unspec_volatile:P [(match_operand:P 1 "register_operand" "0")]
 			    UNSPECV_STACK_PROBE))
-   (set (reg:DI SP_REG) (minus:DI (reg:DI SP_REG) (match_dup 1)))
-   (clobber (reg:DI R10_REG))
-   (clobber (reg:DI R11_REG))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_target_stack_probe ()"
-  "call\t___chkstk"
+  "ix86_target_stack_probe ()"
+  "call\t___chkstk_ms"
   [(set_attr "type" "multi")
    (set_attr "length" "5")])
 
@@ -16312,15 +16295,15 @@ 
     }
   else
     {
-      rtx (*gen_allocate_stack_worker) (rtx, rtx);
-
+      x = copy_to_mode_reg (Pmode, operands[1]);
       if (TARGET_64BIT)
-	gen_allocate_stack_worker = gen_allocate_stack_worker_64;
+        emit_insn (gen_allocate_stack_worker_probe_di (x, x));
       else
-	gen_allocate_stack_worker = gen_allocate_stack_worker_32;
-
-      x = copy_to_mode_reg (Pmode, operands[1]);
-      emit_insn (gen_allocate_stack_worker (x, x));
+        emit_insn (gen_allocate_stack_worker_probe_si (x, x));
+      x = expand_simple_binop (Pmode, MINUS, stack_pointer_rtx, x,
+			       stack_pointer_rtx, 0, OPTAB_DIRECT);
+      if (x != stack_pointer_rtx)
+	emit_move_insn (stack_pointer_rtx, x);
     }
 
   emit_move_insn (operands[0], virtual_stack_dynamic_rtx);
diff --git a/gcc/config/i386/t-cygming b/gcc/config/i386/t-cygming
index 0a65ffd..183e545 100644
--- a/gcc/config/i386/t-cygming
+++ b/gcc/config/i386/t-cygming
@@ -17,7 +17,7 @@ 
 # <http://www.gnu.org/licenses/>.
 
 LIB1ASMSRC = i386/cygwin.asm
-LIB1ASMFUNCS = _chkstk
+LIB1ASMFUNCS = _chkstk _chkstk_ms
 
 # cygwin and mingw always have a limits.h, but, depending upon how we are
 # doing the build, it may not be installed yet.
diff --git a/gcc/config/i386/t-interix b/gcc/config/i386/t-interix
index 9a25831..30539e2 100644
--- a/gcc/config/i386/t-interix
+++ b/gcc/config/i386/t-interix
@@ -1,5 +1,5 @@ 
 LIB1ASMSRC = i386/cygwin.asm
-LIB1ASMFUNCS = _chkstk
+LIB1ASMFUNCS = _chkstk _chkstk_ms
 
 winnt.o: $(srcdir)/config/i386/winnt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
   $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
diff --git a/gcc/configure b/gcc/configure
index c392323..b7a1c11 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -21662,12 +21662,14 @@  else
   gcc_cv_as_cfi_advance_working=no
 fi
 
+
 cat >>confdefs.h <<_ACEOF
 #define HAVE_GAS_CFI_DIRECTIVE `if test $gcc_cv_as_cfi_directive = yes \
        && test $gcc_cv_as_cfi_advance_working = yes; then echo 1; else echo 0; fi`
 _ACEOF
 
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for cfi personality directive" >&5
 $as_echo_n "checking assembler for cfi personality directive... " >&6; }
 if test "${gcc_cv_as_cfi_personality_directive+set}" = set; then :
@@ -21749,6 +21751,7 @@  fi
 $as_echo "$gcc_cv_as_cfi_sections_directive" >&6; }
 
 
+
 cat >>confdefs.h <<_ACEOF
 #define HAVE_GAS_CFI_SECTIONS_DIRECTIVE `if test $gcc_cv_as_cfi_sections_directive = yes;
     then echo 1; else echo 0; fi`
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 6ada451..e9a8614 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2388,11 +2388,13 @@  else
   # no objdump, err on the side of caution
   gcc_cv_as_cfi_advance_working=no
 fi
+GCC_TARGET_TEMPLATE(HAVE_GAS_CFI_DIRECTIVE)
 AC_DEFINE_UNQUOTED(HAVE_GAS_CFI_DIRECTIVE,
   [`if test $gcc_cv_as_cfi_directive = yes \
        && test $gcc_cv_as_cfi_advance_working = yes; then echo 1; else echo 0; fi`],
   [Define 0/1 if your assembler supports CFI directives.])
 
+GCC_TARGET_TEMPLATE(HAVE_GAS_CFI_PERSONALITY_DIRECTIVE)
 gcc_GAS_CHECK_FEATURE([cfi personality directive],
   gcc_cv_as_cfi_personality_directive, ,,
 [	.text
@@ -2426,6 +2428,7 @@  gcc_GAS_CHECK_FEATURE([cfi sections directive],
     gcc_cv_as_cfi_sections_directive=yes
     ;;
 esac])
+GCC_TARGET_TEMPLATE(HAVE_GAS_CFI_SECTIONS_DIRECTIVE)
 AC_DEFINE_UNQUOTED(HAVE_GAS_CFI_SECTIONS_DIRECTIVE,
   [`if test $gcc_cv_as_cfi_sections_directive = yes;
     then echo 1; else echo 0; fi`],