Patchwork [CFT] x64 SEH, test 2

login
register
mail settings
Submitter Richard Henderson
Date Aug. 18, 2010, 4:40 p.m.
Message ID <4C6C0CF4.6060608@redhat.com>
Download mbox | patch
Permalink /patch/62058/
State New
Headers show

Comments

Richard Henderson - Aug. 18, 2010, 4:40 p.m.
As mentioned on IRC, it's supposed to disable re-aligned frames,
since SEH simply cannot represent this in its unwind info.

We'll have to come up with some other mechanism to handle
highly aligned user data.  E.g. via alloca.  Which reminds me,
didn't the AdaCore folk contribute something along those lines
once upon a time?

Please test.


r~
Olivier Hainque - Aug. 18, 2010, 9:35 p.m.
Richard Henderson wrote:
> We'll have to come up with some other mechanism to handle
> highly aligned user data.  E.g. via alloca.  Which reminds me,
> didn't the AdaCore folk contribute something along those lines
> once upon a time?

 Hmm, we have the make_aligning_type circuitry in gigi (gcc-interface/decl.c)
 for this purpose in Ada.  This triggers for alignments > BIGGEST_ALIGNMENT
 (assumes that things are fine up to that, counting on the stack realignment
 for x86).

 The general idea is to craft a record with a field whose offset expression
 derives from the record's base address.

 Olivier
Richard Henderson - Aug. 18, 2010, 10:03 p.m.
On 08/18/2010 02:35 PM, Olivier Hainque wrote:
> Richard Henderson wrote:
>> We'll have to come up with some other mechanism to handle
>> highly aligned user data.  E.g. via alloca.  Which reminds me,
>> didn't the AdaCore folk contribute something along those lines
>> once upon a time?
> 
>  Hmm, we have the make_aligning_type circuitry in gigi (gcc-interface/decl.c)
>  for this purpose in Ada.  This triggers for alignments > BIGGEST_ALIGNMENT
>  (assumes that things are fine up to that, counting on the stack realignment
>  for x86).
> 
>  The general idea is to craft a record with a field whose offset expression
>  derives from the record's base address.

Ah, I hadn't realized that you did it in the Ada front end.

I was thinking along the lines of adding an alignment test to the
existing variable partitioning scheme in cfgexpand.c.  We'd then
squash together all partitions that require alignment larger than
we can provide with MAX_STACK_ALIGNMENT.  We'd then allocate this
highly aligned partition with the alloca machinery, and set the
DECL_RTL for all affected variables so that they point into the
alloca block.

This scheme handles user data just fine, but does not handle any
additional alignment required by the register allocator (i.e. 
vector data spills) or by outgoing function parameters.

For the specific case of x64 windows, we are already guaranteed
16-byte stack alignment, and so there should be no extra alignment
needed by either of the above.  (This assumes that AVX spills are
performed with unaligned stores/loads, which is supposed to 
perform well on cpus with that extension.)

However, the scheme does not require _any_ special target support,
and so there are plenty of targets that ought to be able to benefit
from being able to get highly aligned user data.


r~
Olivier Hainque - Aug. 19, 2010, 8:41 a.m.
Richard Henderson wrote:
> I hadn't realized that you did it in the Ada front end.

 Some aspects of the triggers are Ada specific, I think. The central
 facility (make_aligning_type) resorts to a couple of gigi facilities
 but could probably be genericized.

 expand_expr used to know about what it does pretty intimately via
 is_aligning_offset. A while ago, we adjusted the way the field offset
 is constructed and used to _need_ to adjust is_aligning_offset
 accordingly (making it more general, commentary part of an old patch
 below). We were getting crashes otherwise IIRC.

 This need sort of went away with GCC 4 - no more crashes, testcases
 behaving properly etc. I haven't investigated wht in detail.
 
> I was thinking along the lines of adding an alignment test to the
> existing variable partitioning scheme in cfgexpand.c.  We'd then
> squash together all partitions that require alignment larger than
> we can provide with MAX_STACK_ALIGNMENT.  We'd then allocate this
> highly aligned partition with the alloca machinery, and set the
> DECL_RTL for all affected variables so that they point into the
> alloca block.
> 
> This scheme handles user data just fine, but does not handle any
> additional alignment required by the register allocator (i.e. 
> vector data spills) or by outgoing function parameters.
> 
> For the specific case of x64 windows, we are already guaranteed
> 16-byte stack alignment, and so there should be no extra alignment
> needed by either of the above.  (This assumes that AVX spills are
> performed with unaligned stores/loads, which is supposed to 
> perform well on cpus with that extension.)
> 
> However, the scheme does not require _any_ special target support,
> and so there are plenty of targets that ought to be able to benefit
> from being able to get highly aligned user data.

 Sounds like a fine plan indeed.

 One aspect that the gigi bits handle is allocators (ptr := new Bla;),
 working fine as long as releasing is performed using a type with
 alignment characteristics similar to what was used at allocation time.

 Olivier

--

Comments part of old patch to is_aligning_offset, explaining what
it was doing:

+   /* The idea is to match a pattern like the one sketched below:
+
+                  |<---------- OFFSET --------->|
+                  +-----------+-----------------+
+                  |           |      .....      |
+                  +-----------+-----------------+
+                  |<- PHASE ->|<---- DELTA ---->|
+                  o                             o
+                  X                        % ALIGN = 0
+      ...
+
+      that is: OFFSET = PHASE + DELTA, with PHASE a constant, ALIGN a power of
+      2 and then (X+OFFSET) % ALIGN == 0, or (X+PHASE+DELTA) % ALIGN == 0.
+
+      We then model DELTA as the "offset from X+PHASE to get to the next
+      address multiple of ALIGN", and expect that value to be expressed as
+      -(X+PHASE) & (ALIGN-1).
+
+      We are then scanning for OFFSETs looking like
+
+        ()
+        PLUS_EXPR
+           ()
+           BIT_AND_EXPR ............o
+              ()                    |
+              NEGATE_EXPR           |
+                 PLUS_EXPR          |  Aligning
+                    ()              |   DELTA
+                    ADDR_EXPR (X)   |
+                    CONST (PHASE)   |
+              ALIGN - 1 ............o
+           CONST (PHASE)
+      ...
+
+      with X either EXP or a placeholder of the same type as EXP's, ALIGN a
+      power of 2, and possible intermediate conversions at the () points.  The
+      conversions might involve integer/size/bitsize types in both directions,
+      hence not preserve modes.
+
+      We also match simplified forms of OFFSET with PHASE = 0, that is:
+
+        ()
+        BIT_AND_EXPR
+           ()
+           NEGATE_EXPR
+              ()
+              ADDR_EXPR (X)
+           ALIGN - 1
+
+       with the same assumptions on X and ALIGN.  */
+
Kai Tietz - Aug. 21, 2010, 12:14 p.m.
2010/8/18 Richard Henderson <rth@redhat.com>:
> As mentioned on IRC, it's supposed to disable re-aligned frames,
> since SEH simply cannot represent this in its unwind info.
>
> We'll have to come up with some other mechanism to handle
> highly aligned user data.  E.g. via alloca.  Which reminds me,
> didn't the AdaCore folk contribute something along those lines
> once upon a time?
>
> Please test.
>
>
> r~
>

Hi, I fixed on small nit in ix86_compute_frame_layout, which caused an
ICE in winnt.c for frame-output. For the x64 SSE
'frame->sse_reg_save_offset' it is essential that it points to 16-byte
aligned offset, even if no SSE register is used at all.

Kai
Kai Tietz - Aug. 21, 2010, 2 p.m.
2010/8/21 Kai Tietz <ktietz70@googlemail.com>:
> 2010/8/18 Richard Henderson <rth@redhat.com>:
>> As mentioned on IRC, it's supposed to disable re-aligned frames,
>> since SEH simply cannot represent this in its unwind info.
>>
>> We'll have to come up with some other mechanism to handle
>> highly aligned user data.  E.g. via alloca.  Which reminds me,
>> didn't the AdaCore folk contribute something along those lines
>> once upon a time?
>>
>> Please test.
>>
>>
>> r~
>>
>
> Hi, I fixed on small nit in ix86_compute_frame_layout, which caused an
> ICE in winnt.c for frame-output. For the x64 SSE
> 'frame->sse_reg_save_offset' it is essential that it points to 16-byte
> aligned offset, even if no SSE register is used at all.

Well, I found that this change would break x86_64 as here not
necessarily an alignment is done. So I adjusted that patch in winnt.c.
It used as maximum stack frame the offset to the sse-registers, which
is equal to the offset of general-register saves, when there are none
sse used. The offset to be used here instead for maximum is
frame_pointer_offset.

I am just bootstrapping.

Regards,
Kai
Kai Tietz - Aug. 22, 2010, 8:05 p.m.
2010/8/21 Kai Tietz <ktietz70@googlemail.com>:
> 2010/8/21 Kai Tietz <ktietz70@googlemail.com>:
>> 2010/8/18 Richard Henderson <rth@redhat.com>:
>>> As mentioned on IRC, it's supposed to disable re-aligned frames,
>>> since SEH simply cannot represent this in its unwind info.
>>>
>>> We'll have to come up with some other mechanism to handle
>>> highly aligned user data.  E.g. via alloca.  Which reminds me,
>>> didn't the AdaCore folk contribute something along those lines
>>> once upon a time?
>>>
>>> Please test.
>>>
>>>
>>> r~
>>>
>>
>> Hi, I fixed on small nit in ix86_compute_frame_layout, which caused an
>> ICE in winnt.c for frame-output. For the x64 SSE
>> 'frame->sse_reg_save_offset' it is essential that it points to 16-byte
>> aligned offset, even if no SSE register is used at all.
>
> Well, I found that this change would break x86_64 as here not
> necessarily an alignment is done. So I adjusted that patch in winnt.c.
> It used as maximum stack frame the offset to the sse-registers, which
> is equal to the offset of general-register saves, when there are none
> sse used. The offset to be used here instead for maximum is
> frame_pointer_offset.
>
> I am just bootstrapping.
>
> Regards,
> Kai

Well, bootstrap shows that for C and fortran patch works. For c++ the
function 'i386_pe_seh_init' in winnt.c gets called with partial
setuped cfun. Maybe before reload, At lease in some point cfun->cfg
and othe parts of cfun didn't got setuped and so cause ICE in
'ix86_compute_frame_layout'.

Kai
Richard Henderson - Aug. 23, 2010, 3:25 p.m.
On 08/22/2010 01:05 PM, Kai Tietz wrote:
> Well, bootstrap shows that for C and fortran patch works. For c++ the
> function 'i386_pe_seh_init' in winnt.c gets called with partial
> setuped cfun. Maybe before reload, At lease in some point cfun->cfg
> and othe parts of cfun didn't got setuped and so cause ICE in
> 'ix86_compute_frame_layout'.

Thunks.  I'll look into it today.


r~

Patch

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index a6434f3..eb776f9 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -33,6 +33,23 @@  along with GCC; see the file COPYING3.  If not see
 #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
 #endif
 
+#undef TARGET_SEH
+#define TARGET_SEH  TARGET_64BIT_MS_ABI
+
+/* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
+   Force the use of different mechanisms to allocate aligned local data.  */
+#undef MAX_STACK_ALIGNMENT
+#define MAX_STACK_ALIGNMENT  (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT)
+
+/* Support hooks for SEH.  */
+#undef  TARGET_ASM_UNWIND_EMIT
+#define TARGET_ASM_UNWIND_EMIT  i386_pe_seh_unwind_emit
+#undef  TARGET_ASM_UNWIND_EMIT_BEFORE_INSN
+#define TARGET_ASM_UNWIND_EMIT_BEFORE_INSN  false
+#undef  TARGET_ASM_FUNCTION_END_PROLOGUE
+#define TARGET_ASM_FUNCTION_END_PROLOGUE  i386_pe_seh_end_prologue
+#define SUBTARGET_ASM_UNWIND_INIT  i386_pe_seh_init
+
 #undef DEFAULT_ABI
 #define DEFAULT_ABI (TARGET_64BIT ? MS_ABI : SYSV_ABI)
 
@@ -267,15 +284,12 @@  do {						\
    properly.  If we are generating SDB debugging information, this
    will happen automatically, so we only need to handle other cases.  */
 #undef ASM_DECLARE_FUNCTION_NAME
-#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)			\
-  do									\
-    {									\
-      i386_pe_maybe_record_exported_symbol (DECL, NAME, 0);		\
-      if (write_symbols != SDB_DEBUG)					\
-	i386_pe_declare_function_type (FILE, NAME, TREE_PUBLIC (DECL));	\
-      ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL);			\
-    }									\
-  while (0)
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL) \
+  i386_pe_start_function (FILE, NAME, DECL)
+
+#undef ASM_DECLARE_FUNCTION_SIZE
+#define ASM_DECLARE_FUNCTION_SIZE(FILE,NAME,DECL) \
+  i386_pe_end_function (FILE, NAME, DECL)
 
 /* Add an external function to the list of functions to be declared at
    the end of the file.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a72a432..99da992 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -31,6 +31,7 @@  extern void ix86_setup_frame_addresses (void);
 extern HOST_WIDE_INT ix86_initial_elimination_offset (int, int);
 extern void ix86_expand_prologue (void);
 extern void ix86_expand_epilogue (int);
+extern void ix86_compute_frame_layout (struct ix86_frame *);
 
 extern void ix86_output_addr_vec_elt (FILE *, int);
 extern void ix86_output_addr_diff_elt (FILE *, int, int);
@@ -229,8 +230,14 @@  extern void i386_pe_asm_output_aligned_decl_common (FILE *, tree,
 						    HOST_WIDE_INT,
 						    HOST_WIDE_INT);
 extern void i386_pe_file_end (void);
+extern void i386_pe_start_function (FILE *, const char *, tree);
+extern void i386_pe_end_function (FILE *, const char *, tree);
 extern tree i386_pe_mangle_decl_assembler_name (tree, tree);
 
+extern void i386_pe_seh_init (FILE *);
+extern void i386_pe_seh_end_prologue (FILE *);
+extern void i386_pe_seh_unwind_emit (FILE *, rtx);
+
 /* In winnt-cxx.c and winnt-stubs.c  */
 extern void i386_pe_adjust_class_at_definition (tree);
 extern bool i386_pe_type_dllimport_p (tree);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f1d4402..f1e783e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -1818,53 +1818,6 @@  struct GTY(()) stack_local_entry {
   struct stack_local_entry *next;
 };
 
-/* Structure describing stack frame layout.
-   Stack grows downward:
-
-   [arguments]
-					<- ARG_POINTER
-   saved pc
-
-   saved static chain			if ix86_static_chain_on_stack
-
-   saved frame pointer			if frame_pointer_needed
-					<- HARD_FRAME_POINTER
-   [saved regs]
-					<- regs_save_offset
-   [padding0]
-
-   [saved SSE regs]
-					<- sse_regs_save_offset
-   [padding1]          |
-		       |		<- FRAME_POINTER
-   [va_arg registers]  |
-		       |
-   [frame]	       |
-		       |
-   [padding2]	       | = to_allocate
-					<- STACK_POINTER
-  */
-struct ix86_frame
-{
-  int nsseregs;
-  int nregs;
-  int va_arg_size;
-  int red_zone_size;
-  int outgoing_arguments_size;
-  HOST_WIDE_INT frame;
-
-  /* The offsets relative to ARG_POINTER.  */
-  HOST_WIDE_INT frame_pointer_offset;
-  HOST_WIDE_INT hard_frame_pointer_offset;
-  HOST_WIDE_INT stack_pointer_offset;
-  HOST_WIDE_INT reg_save_offset;
-  HOST_WIDE_INT sse_reg_save_offset;
-
-  /* When save_regs_using_mov is set, emit prologue using
-     move instead of push instructions.  */
-  bool save_regs_using_mov;
-};
-
 /* Code model option.  */
 enum cmodel ix86_cmodel;
 /* Asm dialect.  */
@@ -1976,7 +1929,6 @@  static rtx ix86_function_value (const_tree, const_tree, bool);
 static bool ix86_function_value_regno_p (const unsigned int);
 static rtx ix86_static_chain (const_tree, bool);
 static int ix86_function_regparm (const_tree, const_tree);
-static void ix86_compute_frame_layout (struct ix86_frame *);
 static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode,
 						 rtx, rtx, int);
 static void ix86_add_new_builtins (int);
@@ -2005,7 +1957,6 @@  static void ix86_set_current_function (tree);
 static unsigned int ix86_minimum_incoming_stack_boundary (bool);
 
 static enum calling_abi ix86_function_abi (const_tree);
-
 
 #ifndef SUBTARGET32_DEFAULT_CPU
 #define SUBTARGET32_DEFAULT_CPU "i386"
@@ -5170,6 +5121,10 @@  ix86_asm_output_function_label (FILE *asm_out_file, const char *fname,
         fprintf (asm_out_file, ASM_LONG " %#x\n", filler_cc);
     }
 
+#ifdef SUBTARGET_ASM_UNWIND_INIT
+  SUBTARGET_ASM_UNWIND_INIT (asm_out_file);
+#endif
+
   ASM_OUTPUT_LABEL (asm_out_file, fname);
 
   /* Output magic byte marker, if hot-patch attribute is set.  */
@@ -8322,7 +8277,7 @@  ix86_builtin_setjmp_frame_value (void)
 
 /* Fill structure ix86_frame about frame of currently computed function.  */
 
-static void
+void
 ix86_compute_frame_layout (struct ix86_frame *frame)
 {
   unsigned int stack_alignment_needed;
@@ -9755,6 +9710,11 @@  ix86_expand_prologue (void)
   /* Emit cld instruction if stringops are used in the function.  */
   if (TARGET_CLD && ix86_current_function_needs_cld)
     emit_insn (gen_cld ());
+
+  /* SEH requires that the prologue end within 256 bytes of the start of
+     the function.  Prevent instruction schedules that would extend that.  */
+  if (TARGET_SEH)
+    emit_insn (gen_blockage ());
 }
 
 /* Emit code to restore REG using a POP insn.  */
@@ -9977,13 +9937,16 @@  ix86_expand_epilogue (int style)
   if (crtl->calls_eh_return && style != 2)
     frame.reg_save_offset -= 2 * UNITS_PER_WORD;
 
+  /* EH_RETURN requires the use of moves to function properly.  */
+  if (crtl->calls_eh_return)
+    restore_regs_via_mov = true;
+  /* SEH requires the use of pops to identify the epilogue.  */
+  else if (TARGET_SEH)
+    restore_regs_via_mov = false;
   /* If we're only restoring one register and sp is not valid then
      using a move instruction to restore the register since it's
      less work than reloading sp and popping the register.  */
-  if (!m->fs.sp_valid && frame.nregs <= 1)
-    restore_regs_via_mov = true;
-  /* EH_RETURN requires the use of moves to function properly.  */
-  else if (crtl->calls_eh_return)
+  else if (!m->fs.sp_valid && frame.nregs <= 1)
     restore_regs_via_mov = true;
   else if (TARGET_EPILOGUE_USING_MOVE
 	   && cfun->machine->use_fast_prologue_epilogue
@@ -10094,6 +10057,13 @@  ix86_expand_epilogue (int style)
     }
   else
     {
+      /* SEH requires that the function end with (1) a stack adjustment
+	 if necessary, (2) a sequence of pops, and (3) a return or
+	 jump instruction.  Prevent insns from the function body from
+	 being scheduled into this sequence.  */
+      if (TARGET_SEH)
+	emit_insn (gen_blockage ());
+
       /* First step is to deallocate the stack frame so that we can
 	 pop the registers.  */
       if (!m->fs.sp_valid)
@@ -31465,6 +31435,7 @@  ix86_enum_va_list (int idx, const char **pname, tree *ptree)
 
   return 0;
 }
+
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 9ef0a1b..186d614 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -486,6 +486,9 @@  extern tree x86_mfence;
 /* For the Windows 64-bit ABI.  */
 #define TARGET_64BIT_MS_ABI (TARGET_64BIT && ix86_cfun_abi () == MS_ABI)
 
+/* This is re-defined by cygming.h.  */
+#define TARGET_SEH 0
+
 /* Available call abi.  */
 enum calling_abi
 {
@@ -2325,6 +2328,55 @@  struct GTY(()) machine_frame_state
   BOOL_BITFIELD realigned : 1;
 };
 
+/* Structure describing stack frame layout.
+   Stack grows downward:
+
+   [arguments]
+					<- ARG_POINTER
+   saved pc
+
+   saved static chain			if ix86_static_chain_on_stack
+
+   saved frame pointer			if frame_pointer_needed
+					<- HARD_FRAME_POINTER
+   [saved regs]
+					<- regs_save_offset
+   [padding0]
+
+   [saved SSE regs]
+					<- sse_regs_save_offset
+   [padding1]          |
+		       |		<- FRAME_POINTER
+   [va_arg registers]  |
+		       |
+   [frame]	       |
+		       |
+   [padding2]	       | = to_allocate
+					<- STACK_POINTER
+  */
+struct ix86_frame
+{
+  int nsseregs;
+  int nregs;
+  int va_arg_size;
+  int red_zone_size;
+  int outgoing_arguments_size;
+  HOST_WIDE_INT frame;
+
+  /* The offsets relative to ARG_POINTER.  */
+  HOST_WIDE_INT frame_pointer_offset;
+  HOST_WIDE_INT hard_frame_pointer_offset;
+  HOST_WIDE_INT stack_pointer_offset;
+  HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* When save_regs_using_mov is set, emit prologue using
+     move instead of push instructions.  */
+  bool save_regs_using_mov;
+};
+
+struct seh_frame_state;
+
 struct GTY(()) machine_function {
   struct stack_local_entry *stack_locals;
   const char *some_ld_name;
@@ -2368,6 +2420,9 @@  struct GTY(()) machine_function {
   /* During prologue/epilogue generation, the current frame state.
      Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
+
+  /* During SEH output, this is non-null.  */
+  struct seh_frame_state * GTY((skip(""))) seh;
 };
 #endif
 
diff --git a/gcc/config/i386/winnt.c b/gcc/config/i386/winnt.c
index 60a8b79..925326f 100644
--- a/gcc/config/i386/winnt.c
+++ b/gcc/config/i386/winnt.c
@@ -730,4 +730,394 @@  i386_pe_file_end (void)
     }
 }
 
+
+/* x64 Structured Exception Handling unwind info.  */
+
+struct seh_frame_state
+{
+  /* SEH uses unsigned offsets for all register saves.  When using a frame
+     pointer, these offsets are relative to FP_BASE = FP_REG - FP_OFFSET.
+     Not to be confusing with too many "offset" values, but this variable
+     contains the CFA offset of the FP_BASE we've chosen for this function.
+     This makes it easy fo compare with other CFA offsets we'll be
+     extracting from the RTX_FRAME_RELATED expressions.  */
+  HOST_WIDE_INT fp_base_offset;
+
+  /* Copied from ix86_frame.  */
+  HOST_WIDE_INT sse_reg_save_offset;
+
+  /* The CFA is located at CFA_REG + CFA_OFFSET.  */
+  HOST_WIDE_INT cfa_offset;
+  unsigned int cfa_regno;
+  
+  /* From the SEH docs: "If an FP reg is used, then any unwind code
+     taking an offset must only be used after the FP reg is established
+     in the prolog."  This records if such an event has ocurred.  */
+  bool reg_save_emitted;
+};
+
+/* Set up data structures beginning output for SEH.  */
+
+void
+i386_pe_seh_init (FILE *f)
+{
+  struct seh_frame_state *seh;
+  struct ix86_frame frame;
+
+  if (!TARGET_SEH)
+    return;
+
+  /* We cannot support DRAP with SEH.  We turned off support for it by
+     re-defining MAX_STACK_ALIGNMENT when SEH is enabled.  */
+  gcc_assert (!stack_realign_drap);
+
+  ix86_compute_frame_layout (&frame);
+
+  seh = XCNEW (struct seh_frame_state);
+  cfun->machine->seh = seh;
+
+  seh->sse_reg_save_offset = frame.sse_reg_save_offset;
+  seh->cfa_offset = INCOMING_FRAME_SP_OFFSET;
+  seh->cfa_regno = STACK_POINTER_REGNUM;
+
+  fputs ("\t.seh_proc\t", f);
+  assemble_name (f, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (cfun->decl)));
+  fputc ('\n', f);
+}
+
+void
+i386_pe_seh_end_prologue (FILE *f)
+{
+  if (!TARGET_SEH)
+    return;
+
+  XDELETE (cfun->machine->seh);
+  cfun->machine->seh = NULL;
+
+  fputs ("\t.seh_endprologue\n", f);
+}
+
+static void
+i386_pe_seh_fini (FILE *f)
+{
+  if (!TARGET_SEH)
+    return;
+  fputs ("\t.seh_endproc\n", f);
+}
+
+/* Emit an assembler directive to save REG via a PUSH.  */
+
+static void
+seh_emit_push (FILE *f, struct seh_frame_state *seh, rtx reg)
+{
+  unsigned int regno = REGNO (reg);
+
+  gcc_checking_assert (GENERAL_REGNO_P (regno));
+
+  if (seh->cfa_regno == STACK_POINTER_REGNUM)
+    seh->cfa_offset += UNITS_PER_WORD;
+
+  fputs ("\t.seh_pushreg\t", f);
+  print_reg (reg, 0, f);
+  fputc ('\n', f);
+}
+
+/* Emit an assembler directive to save REG at CFA - CFA_OFFSET.  */
+
+static void
+seh_emit_save (FILE *f, struct seh_frame_state *seh,
+	       rtx reg, HOST_WIDE_INT cfa_offset)
+{
+  unsigned int regno = REGNO (reg);
+  HOST_WIDE_INT base_offset, offset;
+
+  if (seh->cfa_regno == STACK_POINTER_REGNUM)
+    base_offset = seh->cfa_offset;
+  else
+    base_offset = seh->fp_base_offset;
+
+  /* Negative save offsets are not supported.  */
+  gcc_assert (base_offset >= cfa_offset);
+  offset = base_offset - cfa_offset;
+
+  fputs ((SSE_REGNO_P (regno) ? "\t.seh_savexmm\t"
+	 : GENERAL_REGNO_P (regno) ?  "\t.seh_savereg\t"
+	 : (gcc_unreachable (), "")), f);
+  print_reg (reg, 0, f);
+  fprintf (f, ", " HOST_WIDE_INT_PRINT_DEC "\n", offset);
+
+  seh->reg_save_emitted = true;
+}
+
+/* Emit an assembler directive to adjust RSP by OFFSET.  */
+
+static void
+seh_emit_stackalloc (FILE *f, struct seh_frame_state *seh,
+		     HOST_WIDE_INT offset)
+{
+  /* We're only concerned with prologue stack allocations, which all
+     are subtractions from the stack pointer.  */
+  gcc_assert (offset < 0);
+  offset = -offset;
+
+  if (seh->cfa_regno == STACK_POINTER_REGNUM)
+    seh->cfa_offset += offset;
+
+  fprintf (f, "\t.seh_stackalloc\t" HOST_WIDE_INT_PRINT_DEC "\n", offset);
+}
+
+/* Emit an assembler directive to set up the frame pointer.  */
+
+static void
+seh_emit_setframe (FILE *f, struct seh_frame_state *seh,
+		   rtx reg, HOST_WIDE_INT cfa_offset,
+		   HOST_WIDE_INT max_offset)
+{
+  HOST_WIDE_INT offset;
+
+  offset = max_offset - seh->cfa_offset;
+  gcc_assert (seh->cfa_regno == STACK_POINTER_REGNUM);
+  gcc_assert (offset >= 0);
+  gcc_assert ((offset & 15) == 0);
+
+  seh->cfa_regno = REGNO (reg);
+  seh->cfa_offset = cfa_offset;
+  seh->fp_base_offset = max_offset;
+
+  fputs ("\t.seh_setframe\t", f);
+  print_reg (reg, 0, f);
+  fprintf (f, ", " HOST_WIDE_INT_PRINT_DEC "\n", offset);
+}
+
+/* Process REG_CFA_ADJUST_CFA for SEH.  */
+
+static void
+seh_cfa_adjust_cfa (FILE *f, struct seh_frame_state *seh, rtx pat)
+{
+  rtx dest, src;
+  HOST_WIDE_INT reg_offset = 0;
+  unsigned int dest_regno;
+
+  dest = SET_DEST (pat);
+  src = SET_SRC (pat);
+
+  if (GET_CODE (src) == PLUS)
+    {
+      reg_offset = INTVAL (XEXP (src, 1));
+      src = XEXP (src, 0);
+    }
+  gcc_assert (src == stack_pointer_rtx);
+  gcc_assert (seh->cfa_regno == STACK_POINTER_REGNUM);
+  dest_regno = REGNO (dest);
+
+  if (dest_regno == STACK_POINTER_REGNUM)
+    seh_emit_stackalloc (f, seh, reg_offset);
+  else if (dest_regno == HARD_FRAME_POINTER_REGNUM)
+    seh_emit_setframe (f, seh, dest, seh->cfa_offset - reg_offset,
+		       seh->sse_reg_save_offset);
+  else
+    gcc_unreachable ();
+}
+
+/* Process REG_CFA_OFFSET for SEH.  */
+
+static void
+seh_cfa_offset (FILE *f, struct seh_frame_state *seh, rtx pat)
+{
+  rtx dest, src;
+  HOST_WIDE_INT reg_offset;
+
+  dest = SET_DEST (pat);
+  src = SET_SRC (pat);
+
+  gcc_assert (MEM_P (dest));
+  dest = XEXP (dest, 0);
+  if (REG_P (dest))
+    reg_offset = 0;
+  else
+    {
+      gcc_assert (GET_CODE (dest) == PLUS);
+      reg_offset = INTVAL (XEXP (dest, 1));
+      dest = XEXP (dest, 0);
+    }
+  gcc_assert (REGNO (dest) == seh->cfa_regno);
+
+  seh_emit_save (f, seh, src, seh->cfa_offset - reg_offset);
+}
+
+/* Process a FRAME_RELATED_EXPR for SEH.  */
+
+static void
+seh_frame_related_expr (FILE *f, struct seh_frame_state *seh, rtx pat)
+{
+  rtx dest, src;
+  HOST_WIDE_INT addend;
+
+  /* See the full loop in dwarf2out_frame_debug_expr.  */
+  if (GET_CODE (pat) == PARALLEL || GET_CODE (pat) == SEQUENCE)
+    {
+      int i, n = XVECLEN (pat, 0), pass, npass;
+
+      npass = (GET_CODE (pat) == PARALLEL ? 2 : 1);
+      for (pass = 0; pass < npass; ++pass)
+	for (i = 0; i < n; ++i)
+	  {
+	    rtx ele = XVECEXP (pat, 0, i);
+
+	    if (GET_CODE (ele) != SET)
+	      continue;
+	    dest = SET_DEST (ele);
+
+	    /* Process each member of the PARALLEL independently.  The first
+	       member is always processed; others only if they are marked.  */
+	    if (i == 0 || RTX_FRAME_RELATED_P (ele))
+	      {
+		/* Evaluate all register saves in the first pass and all
+		   register updates in the second pass.  */
+		if ((MEM_P (dest) ^ pass) || npass == 1)
+		  seh_frame_related_expr (f, seh, ele);
+	      }
+	  }
+      return;
+    }
+
+  dest = SET_DEST (pat);
+  src = SET_SRC (pat);
+
+  switch (GET_CODE (dest))
+    {
+    case REG:
+      switch (GET_CODE (src))
+	{
+	case REG:
+	  /* REG = REG: This should be establishing a frame pointer.  */
+	  gcc_assert (src == stack_pointer_rtx);
+	  gcc_assert (dest == hard_frame_pointer_rtx);
+	  seh_emit_setframe (f, seh, dest, seh->cfa_offset,
+			     seh->sse_reg_save_offset);
+	  break;
+
+	case PLUS:
+	  addend = INTVAL (XEXP (src, 1));
+	  src = XEXP (src, 0);
+	  if (dest == hard_frame_pointer_rtx)
+	    seh_cfa_adjust_cfa (f, seh, pat);
+	  else if (dest == stack_pointer_rtx)
+	    {
+	      gcc_assert (src == stack_pointer_rtx);
+	      seh_emit_stackalloc (f, seh, addend);
+	    }
+	  else
+	    gcc_unreachable ();
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
+      break;
+
+    case MEM:
+      /* A save of some kind.  */
+      dest = XEXP (dest, 0);
+      if (GET_CODE (dest) == PRE_DEC)
+	{
+	  gcc_checking_assert (GET_MODE (src) == Pmode);
+	  gcc_checking_assert (REG_P (src));
+	  seh_emit_push (f, seh, src);
+	}
+      else
+	seh_cfa_offset (f, seh, pat);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* This function looks at a single insn and emits any SEH directives
+   required for unwind of this insn.  */
+
+void
+i386_pe_seh_unwind_emit (FILE *asm_out_file, rtx insn)
+{
+  rtx note, pat;
+  bool handled_one = false;
+  struct seh_frame_state *seh;
+
+  if (!TARGET_SEH)
+    return;
+
+  /* We free the SEH data once done with the prologue.  Ignore those
+     RTX_FRAME_RELATED_P insns that are associated with the epilogue.  */
+  seh = cfun->machine->seh;
+  if (seh == NULL)
+    return;
+
+  if (NOTE_P (insn) || !RTX_FRAME_RELATED_P (insn))
+    return;
+
+  for (note = REG_NOTES (insn); note ; note = XEXP (note, 1))
+    {
+      pat = XEXP (note, 0);
+      switch (REG_NOTE_KIND (note))
+	{
+	case REG_FRAME_RELATED_EXPR:
+	  goto found;
+
+	case REG_CFA_DEF_CFA:
+	case REG_CFA_EXPRESSION:
+	  /* Only emitted with DRAP, which we disable.  */
+	  gcc_unreachable ();
+	  break;
+
+	case REG_CFA_REGISTER:
+	  /* Only emitted in epilogues, which we skip.  */
+	  gcc_unreachable ();
+
+	case REG_CFA_ADJUST_CFA:
+	  if (pat == NULL)
+	    {
+	      pat = PATTERN (insn);
+	      if (GET_CODE (pat) == PARALLEL)
+		pat = XVECEXP (pat, 0, 0);
+	    }
+	  seh_cfa_adjust_cfa (asm_out_file, seh, pat);
+	  handled_one = true;
+	  break;
+
+	case REG_CFA_OFFSET:
+	  if (pat == NULL)
+	    pat = single_set (insn);
+	  seh_cfa_offset (asm_out_file, seh, pat);
+	  handled_one = true;
+	  break;
+
+	default:
+	  break;
+	}
+    }
+  if (handled_one)
+    return;
+  pat = PATTERN (insn);
+ found:
+  seh_frame_related_expr (asm_out_file, seh, pat);
+}
+
+void
+i386_pe_start_function (FILE *f, const char *name, tree decl)
+{
+  i386_pe_maybe_record_exported_symbol (decl, name, 0);
+  if (write_symbols != SDB_DEBUG)
+    i386_pe_declare_function_type (f, name, TREE_PUBLIC (decl));
+  ASM_OUTPUT_FUNCTION_LABEL (f, name, decl);
+}
+
+void
+i386_pe_end_function (FILE *f, const char *name ATTRIBUTE_UNUSED,
+		      tree decl ATTRIBUTE_UNUSED)
+{
+  i386_pe_seh_fini (f);
+}
+
+
 #include "gt-winnt.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 3e38618..b5330e9 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -8730,6 +8730,10 @@  This target hook emits assembly directives required to unwind the
 given instruction.  This is only used when TARGET_UNWIND_INFO is set.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_ASM_UNWIND_EMIT_BEFORE_INSN
+True if the @code{TARGET_ASM_UNWIND_EMIT} hook should be called before the assembly for @var{insn} has been emitted, false if the hook should be called afterward.
+@end deftypevr
+
 @node Exception Region Output
 @subsection Assembler Commands for Exception Regions
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6358916..0ca77bc 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8720,6 +8720,8 @@  This target hook emits assembly directives required to unwind the
 given instruction.  This is only used when TARGET_UNWIND_INFO is set.
 @end deftypefn
 
+@hook TARGET_ASM_UNWIND_EMIT_BEFORE_INSN
+
 @node Exception Region Output
 @subsection Assembler Commands for Exception Regions
 
diff --git a/gcc/final.c b/gcc/final.c
index 73c6069..06ebc17 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -2655,7 +2655,8 @@  final_scan_insn (rtx insn, FILE *file, int optimize ATTRIBUTE_UNUSED,
 	/* ??? This will put the directives in the wrong place if
 	   get_insn_template outputs assembly directly.  However calling it
 	   before get_insn_template breaks if the insns is split.  */
-	if (targetm.asm_out.unwind_emit)
+	if (targetm.asm_out.unwind_emit_before_insn
+	    && targetm.asm_out.unwind_emit)
 	  targetm.asm_out.unwind_emit (asm_out_file, insn);
 
 	if (CALL_P (insn))
@@ -2713,6 +2714,10 @@  final_scan_insn (rtx insn, FILE *file, int optimize ATTRIBUTE_UNUSED,
 	  dwarf2out_frame_debug (insn, true);
 #endif
 
+	if (!targetm.asm_out.unwind_emit_before_insn
+	    && targetm.asm_out.unwind_emit)
+	  targetm.asm_out.unwind_emit (asm_out_file, insn);
+
 	current_output_insn = debug_insn = 0;
       }
     }
diff --git a/gcc/target.def b/gcc/target.def
index 46e3ef7..fafa351 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -152,6 +152,13 @@  DEFHOOK
  void, (FILE *stream, rtx insn),
  NULL)
 
+DEFHOOKPOD
+(unwind_emit_before_insn,
+ "True if the @code{TARGET_ASM_UNWIND_EMIT} hook should be called before\
+ the assembly for @var{insn} has been emitted, false if the hook should\
+ be called afterward.",
+ bool, true)
+
 /* Output an internal label.  */
 DEFHOOK
 (internal_label,