diff mbox

x86 interrupt attribute

Message ID CAFULd4Y-A6vMaXJ1+EzkFexJ7xeWb3ri=XpHAK1UCXGFB0si4w@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Oct. 4, 2015, 10:29 a.m. UTC
On Sun, Oct 4, 2015 at 7:23 AM, Yulia Koval <vaalfreja@gmail.com> wrote:
> Hi,
>
> Here is the last version of the patch. Regtested/bootstraped for
> Linux/i686 and Linux/x86_64.
>
> Date: Fri, 4 Sep 2015 08:53:23 -0700
> Subject: [PATCH] Implement x86 interrupt attribute
>
> The interrupt and exception handlers are called by x86 processors.  X86
> hardware pushes information onto stack and calls the handler.  The
> requirements are
>
> 1. Both interrupt and exception handlers must use the 'IRET' instruction,
> instead of the 'RET' instruction, to return from the handlers.
> 2. All registers are callee-saved in interrupt and exception handlers.
> 3. The difference between interrupt and exception handlers is the
> exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
> instruction.
>
> The design goals of interrupt and exception handlers for x86 processors
> are:
>
> 1. Support both 32-bit and 64-bit modes.
> 2. Flexible for compilers to optimize.
> 3. Easy to use by programmers.
>
> To implement interrupt and exception handlers for x86 processors, a
> compiler should support:
>
> 'interrupt' attribute
>
> Use this attribute to indicate that the specified function with
> mandatory arguments is an interrupt or exception handler.  The compiler
> generates function entry and exit sequences suitable for use in an
> interrupt handler when this attribute is present.  The 'IRET' instruction,
> instead of the 'RET' instruction, is used to return from interrupt or
> exception handlers.  All registers, except for the EFLAGS register which
> is restored by the 'IRET' instruction, are preserved by the compiler.
>
> Any interruptible-without-stack-switch code must be compiled with
> -mno-red-zone since interrupt handlers can and will, because of the
> hardware design, touch the red zone.
>
> 1. interrupt handler must be declared with a mandatory pointer argument:
>
> struct interrupt_frame;
>
> __attribute__ ((interrupt))
> void
> f (struct interrupt_frame *frame)
> {
> ...
> }
>
> and user must properly define the structure the pointer pointing to.
>
> 2. exception handler:
>
> The exception handler is very similar to the interrupt handler with
> a different mandatory function signature:
>
> typedef unsigned long long int uword_t;
> typedef unsigned int uword_t;
>
> struct interrupt_frame;
>
> __attribute__ ((interrupt))
> void
> f (struct interrupt_frame *frame, uword_t error_code)
> {
> ...
> }
>
> and compiler pops the error code off stack before the 'IRET' instruction.
>
> The exception handler should only be used for exceptions which push an
> error code and all other exceptions must use the interrupt handler.
> The system will crash if the wrong handler is used.
>
> To be feature complete, compiler may implement the optional
> 'no_caller_saved_registers' attribute:
>
> Use this attribute to indicate that the specified function has no
> caller-saved registers.  That is, all registers are callee-saved.
> The compiler generates proper function entry and exit sequences to
> save and restore any modified registers.
>
> The user can call functions specified with 'no_caller_saved_registers'
> attribute from an interrupt handler without saving and restoring all
> call clobbered registers.

Looking a bit deeper into the code, it looks that we want to realign
the stack in the interrupt handler. Let's assume that interrupt
handler is calling some other function that saves SSE vector regs to
the stack. According to the x86 ABI, incoming stack of the called
function is assumed to be aligned to 16 bytes. But, interrupt handler
violates this assumption, since the stack could be aligned to only 4
bytes for 32bit and 8 bytes for 64bit targets. Entering the called
function with stack, aligned to less than 16 bytes will certainly
violate ABI.

So, it looks to me that we need to realign the stack in the interrupt
handler unconditionally to 16bytes. In this case, we also won't need
the following changes:

@@ -11284,9 +11474,29 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
       /* The only ABI that has saved SSE registers (Win64) also has a
          16-byte aligned default stack, and thus we don't need to be
      within the re-aligned local stack frame to save them.  */
-      gcc_assert (INCOMING_STACK_BOUNDARY >= 128);
-      offset = ROUND_UP (offset, 16);
-      offset += frame->nsseregs * 16;
+      unsigned int incoming_stack_boundary
+    = MAX (crtl->parm_stack_boundary, ix86_incoming_stack_boundary);
+      unsigned int vec_regsize;
+
+      /* We must save full vector registers if there are no
+     caller-saved registers.  */
+      if (cfun->machine->no_caller_saved_registers)
+    vec_regsize = (TARGET_AVX512F ? 64 : (TARGET_AVX ? 32 : 16));
+      else
+    vec_regsize = 16;
+
+      if (cfun->machine->func_type == TYPE_NORMAL)
+    {
+      /* Incoming stack may be 32-bit aligned in 32-bit mode.  */
+      gcc_assert (!TARGET_64BIT || incoming_stack_boundary >= 128);
+      /* Don't over-align vector register save area.  */
+      incoming_stack_boundary /= BITS_PER_UNIT;
+      if (incoming_stack_boundary > vec_regsize)
+        incoming_stack_boundary = vec_regsize;
+      offset = ROUND_UP (offset, incoming_stack_boundary);
+    }
+
+      offset += frame->nsseregs * vec_regsize;
     }
   frame->sse_reg_save_offset = offset;

...

@@ -11513,8 +11727,28 @@ ix86_emit_save_reg_using_mov (machine_mode
mode, unsigned int regno,
   addr = choose_baseaddr (cfa_offset);
   mem = gen_frame_mem (mode, addr);

-  /* For SSE saves, we need to indicate the 128-bit alignment.  */
-  set_mem_align (mem, GET_MODE_ALIGNMENT (mode));
+  /* For SSE saves, we need to indicate the 128-bit alignment.  In
+     interrupt handler, stack is aligned to word_mode.  We can't use
+     gen_sse_storeups since RTX_FRAME_RELATED_P is set and
+     dwarf2out_flush_queued_reg_saves doesn't like UNSPEC_STOREU.
+     Also gen_sse_storeups doesn't cover AVX nor AVX512.  */
+  unsigned int incoming_stack_boundary;
+  if (cfun->machine->func_type == TYPE_NORMAL)
+    {
+      incoming_stack_boundary
+    = MAX (crtl->parm_stack_boundary, ix86_incoming_stack_boundary);
+      if (!TARGET_64BIT || incoming_stack_boundary >= 128)
+    {
+      /* Don't over-align vector register save slot.  */
+      if (incoming_stack_boundary > GET_MODE_ALIGNMENT (mode))
+        incoming_stack_boundary = GET_MODE_ALIGNMENT (mode);
+    }
+      else
+    gcc_unreachable ();
+    }
+  else
+    incoming_stack_boundary = GET_MODE_ALIGNMENT (word_mode);
+  set_mem_align (mem, incoming_stack_boundary);

   insn = emit_move_insn (mem, reg);
   RTX_FRAME_RELATED_P (insn) = 1;

and

If we want realignment to work for x86_64, we can also enable
-mstack-realign also for x86_64, as was recently requested from
Windows people. (I can't find the relevant PR ATM...)


What also bothers me is the following change:

@@ -11248,11 +11433,16 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
        = !expensive_function_p (count);
     }

+  /* We must use move to save bound and mask registers.  */
   frame->save_regs_using_mov
-    = (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
-       /* If static stack checking is enabled and done with probes,
-      the registers need to be saved before allocating the frame.  */
-       && flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
+    = (frame->nbndregs > 0
+       || frame->nmaskregs > 0
+       || (TARGET_PROLOGUE_USING_MOVE
+       && cfun->machine->use_fast_prologue_epilogue
+       /* If static stack checking is enabled and done with probes,
+          the registers need to be saved before allocating the
+          frame.  */
+       && flag_stack_check != STATIC_BUILTIN_STACK_CHECK));

In the above code, we force the whole frame to save_regs_using_mov,
when we have to save BND and MASK registers. However, AFAICS,
cygwin/mingw is able to save SSE registers (that also don't have PUSH
insns) without this requirement. We should use the same approach to
save BND and MASK registers.

Uros.

Comments

H.J. Lu Oct. 4, 2015, 6:15 p.m. UTC | #1
On Sun, Oct 4, 2015 at 3:29 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Oct 4, 2015 at 7:23 AM, Yulia Koval <vaalfreja@gmail.com> wrote:
>> Hi,
>>
>> Here is the last version of the patch. Regtested/bootstraped for
>> Linux/i686 and Linux/x86_64.
>>
>> Date: Fri, 4 Sep 2015 08:53:23 -0700
>> Subject: [PATCH] Implement x86 interrupt attribute
>>
>> The interrupt and exception handlers are called by x86 processors.  X86
>> hardware pushes information onto stack and calls the handler.  The
>> requirements are
>>
>> 1. Both interrupt and exception handlers must use the 'IRET' instruction,
>> instead of the 'RET' instruction, to return from the handlers.
>> 2. All registers are callee-saved in interrupt and exception handlers.
>> 3. The difference between interrupt and exception handlers is the
>> exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
>> instruction.
>>
>> The design goals of interrupt and exception handlers for x86 processors
>> are:
>>
>> 1. Support both 32-bit and 64-bit modes.
>> 2. Flexible for compilers to optimize.
>> 3. Easy to use by programmers.
>>
>> To implement interrupt and exception handlers for x86 processors, a
>> compiler should support:
>>
>> 'interrupt' attribute
>>
>> Use this attribute to indicate that the specified function with
>> mandatory arguments is an interrupt or exception handler.  The compiler
>> generates function entry and exit sequences suitable for use in an
>> interrupt handler when this attribute is present.  The 'IRET' instruction,
>> instead of the 'RET' instruction, is used to return from interrupt or
>> exception handlers.  All registers, except for the EFLAGS register which
>> is restored by the 'IRET' instruction, are preserved by the compiler.
>>
>> Any interruptible-without-stack-switch code must be compiled with
>> -mno-red-zone since interrupt handlers can and will, because of the
>> hardware design, touch the red zone.
>>
>> 1. interrupt handler must be declared with a mandatory pointer argument:
>>
>> struct interrupt_frame;
>>
>> __attribute__ ((interrupt))
>> void
>> f (struct interrupt_frame *frame)
>> {
>> ...
>> }
>>
>> and user must properly define the structure the pointer pointing to.
>>
>> 2. exception handler:
>>
>> The exception handler is very similar to the interrupt handler with
>> a different mandatory function signature:
>>
>> typedef unsigned long long int uword_t;
>> typedef unsigned int uword_t;
>>
>> struct interrupt_frame;
>>
>> __attribute__ ((interrupt))
>> void
>> f (struct interrupt_frame *frame, uword_t error_code)
>> {
>> ...
>> }
>>
>> and compiler pops the error code off stack before the 'IRET' instruction.
>>
>> The exception handler should only be used for exceptions which push an
>> error code and all other exceptions must use the interrupt handler.
>> The system will crash if the wrong handler is used.
>>
>> To be feature complete, compiler may implement the optional
>> 'no_caller_saved_registers' attribute:
>>
>> Use this attribute to indicate that the specified function has no
>> caller-saved registers.  That is, all registers are callee-saved.
>> The compiler generates proper function entry and exit sequences to
>> save and restore any modified registers.
>>
>> The user can call functions specified with 'no_caller_saved_registers'
>> attribute from an interrupt handler without saving and restoring all
>> call clobbered registers.
>
> Looking a bit deeper into the code, it looks that we want to realign
> the stack in the interrupt handler. Let's assume that interrupt
> handler is calling some other function that saves SSE vector regs to
> the stack. According to the x86 ABI, incoming stack of the called
> function is assumed to be aligned to 16 bytes. But, interrupt handler
> violates this assumption, since the stack could be aligned to only 4
> bytes for 32bit and 8 bytes for 64bit targets. Entering the called
> function with stack, aligned to less than 16 bytes will certainly
> violate ABI.
>
> So, it looks to me that we need to realign the stack in the interrupt
> handler unconditionally to 16bytes. In this case, we also won't need
> the following changes:
>

Current stack alignment implementation requires at least
one, maybe two, scratch registers:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841

Extend it to the interrupt handler, which doesn't have any scratch
registers may require significant changes in backend as well as
register allocator.
Uros Bizjak Oct. 4, 2015, 8 p.m. UTC | #2
On Sun, Oct 4, 2015 at 8:15 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>> Looking a bit deeper into the code, it looks that we want to realign
>> the stack in the interrupt handler. Let's assume that interrupt
>> handler is calling some other function that saves SSE vector regs to
>> the stack. According to the x86 ABI, incoming stack of the called
>> function is assumed to be aligned to 16 bytes. But, interrupt handler
>> violates this assumption, since the stack could be aligned to only 4
>> bytes for 32bit and 8 bytes for 64bit targets. Entering the called
>> function with stack, aligned to less than 16 bytes will certainly
>> violate ABI.
>>
>> So, it looks to me that we need to realign the stack in the interrupt
>> handler unconditionally to 16bytes. In this case, we also won't need
>> the following changes:
>>
>
> Current stack alignment implementation requires at least
> one, maybe two, scratch registers:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841
>
> Extend it to the interrupt handler, which doesn't have any scratch
> registers may require significant changes in backend as well as
> register allocator.

But without realignment, the handler is unusable for anything but
simple functions. The handler will crash when called function will try
to save vector reg to stack.

Uros.
H.J. Lu Oct. 4, 2015, 8:51 p.m. UTC | #3
On Sun, Oct 4, 2015 at 1:00 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Oct 4, 2015 at 8:15 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>> Looking a bit deeper into the code, it looks that we want to realign
>>> the stack in the interrupt handler. Let's assume that interrupt
>>> handler is calling some other function that saves SSE vector regs to
>>> the stack. According to the x86 ABI, incoming stack of the called
>>> function is assumed to be aligned to 16 bytes. But, interrupt handler
>>> violates this assumption, since the stack could be aligned to only 4
>>> bytes for 32bit and 8 bytes for 64bit targets. Entering the called
>>> function with stack, aligned to less than 16 bytes will certainly
>>> violate ABI.
>>>
>>> So, it looks to me that we need to realign the stack in the interrupt
>>> handler unconditionally to 16bytes. In this case, we also won't need
>>> the following changes:
>>>
>>
>> Current stack alignment implementation requires at least
>> one, maybe two, scratch registers:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841
>>
>> Extend it to the interrupt handler, which doesn't have any scratch
>> registers may require significant changes in backend as well as
>> register allocator.
>
> But without realignment, the handler is unusable for anything but
> simple functions. The handler will crash when called function will try
> to save vector reg to stack.
>

We can use unaligned load and store to avoid crash.
Uros Bizjak Oct. 4, 2015, 9:07 p.m. UTC | #4
On Sun, Oct 4, 2015 at 10:51 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Oct 4, 2015 at 1:00 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sun, Oct 4, 2015 at 8:15 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>> Looking a bit deeper into the code, it looks that we want to realign
>>>> the stack in the interrupt handler. Let's assume that interrupt
>>>> handler is calling some other function that saves SSE vector regs to
>>>> the stack. According to the x86 ABI, incoming stack of the called
>>>> function is assumed to be aligned to 16 bytes. But, interrupt handler
>>>> violates this assumption, since the stack could be aligned to only 4
>>>> bytes for 32bit and 8 bytes for 64bit targets. Entering the called
>>>> function with stack, aligned to less than 16 bytes will certainly
>>>> violate ABI.
>>>>
>>>> So, it looks to me that we need to realign the stack in the interrupt
>>>> handler unconditionally to 16bytes. In this case, we also won't need
>>>> the following changes:
>>>>
>>>
>>> Current stack alignment implementation requires at least
>>> one, maybe two, scratch registers:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841
>>>
>>> Extend it to the interrupt handler, which doesn't have any scratch
>>> registers may require significant changes in backend as well as
>>> register allocator.
>>
>> But without realignment, the handler is unusable for anything but
>> simple functions. The handler will crash when called function will try
>> to save vector reg to stack.
>>
>
> We can use unaligned load and store to avoid crash.

Oh, sorry, I meant "called function will crash", like:

-> interrupt when %rsp = 0x...8 ->
-> interrupt handler ->
-> calls some function that tries to save xmm reg to stack
-> crash in the called function

Uros.
Mike Stump Oct. 5, 2015, 8:58 a.m. UTC | #5
On Oct 4, 2015, at 11:15 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> Current stack alignment implementation requires at least
> one, maybe two, scratch registers:

So, I have some cases where I need scratch registers as well.  I always save 2 registers and they go first (and restore last), so I can always use them.
diff mbox

Patch

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -875,10 +875,18 @@ 
     case MODE_V16SF:
     case MODE_V8SF:
     case MODE_V4SF:
-      if (TARGET_AVX
+      /* We must support misaligned SSE load and store in interrupt
+         handler or there are no caller-saved registers and we are
+         in 32-bit mode since ix86_emit_save_reg_using_mov generates
+         the normal *mov<mode>_internal pattern to save and restore
+         SSE registers with misaligned stack.  */
+      if ((TARGET_AVX
+           || cfun->machine->func_type != TYPE_NORMAL
+           || (!TARGET_64BIT
+           && cfun->machine->no_caller_saved_registers))
           && (misaligned_operand (operands[0], <MODE>mode)
           || misaligned_operand (operands[1], <MODE>mode)))
-        return "vmovups\t{%1, %0|%0, %1}";
+        return "%vmovups\t{%1, %0|%0, %1}";
       else
         return "%vmovaps\t{%1, %0|%0, %1}";