Patchwork PATCH RFA: Split stack [4/7]: REF_CFA_TEMPORARY

login
register
mail settings
Submitter Ian Taylor
Date Sept. 22, 2010, 10:03 p.m.
Message ID <mcrmxr977wr.fsf@google.com>
Download mbox | patch
Permalink /patch/65465/
State New
Headers show

Comments

Ian Taylor - Sept. 22, 2010, 10:03 p.m.
This is the fourth of the -fsplit-stack patches.  This patch adds a new
feature to the DWARF frame note.  If an insn has a new REG_CFA_TEMPORARY
note, then the CFA is set to the value of the note only for that insn.
The i386 backend uses this new feature to set the unwind information for
a special ret instruction that it uses to permit the processor to do
call/return prediction.

This patch is in the middle-end and as such does not require approval,
but as it is somewhat unrelated to the rest of the patches I am
separating it out for possible comment.

Ian


2010-09-21  Ian Lance Taylor  <iant@google.com>

	* reg-notes.def (CFA_TEMPORARY): New note.
	* dwarf2out.c (dwarf2out_frame_debug): Handle REG_CFA_TEMPORARY.
	(dwarf2out_cfi_begin_epilogue): Call
	dwarf2out_frame_debug_remember_state.
	(dwarf2out_frame_debug_remember_state): New static function,
	broken out of dwarf2out_cfi_begin_epilogue.
Richard Henderson - Sept. 23, 2010, 5:28 p.m.
On 09/22/2010 03:03 PM, Ian Lance Taylor wrote:
> This is the fourth of the -fsplit-stack patches.  This patch adds a new
> feature to the DWARF frame note.  If an insn has a new REG_CFA_TEMPORARY
> note, then the CFA is set to the value of the note only for that insn.
> The i386 backend uses this new feature to set the unwind information for
> a special ret instruction that it uses to permit the processor to do
> call/return prediction.

Er, I'm not keen on this.  In particular, how does 

> +	    dwarf2out_frame_debug_remember_state ();
> +	    dwarf2out_frame_debug_def_cfa (XEXP (note, 0), label);
> +	  }
> +	else
> +	  dwarf2out_frame_debug_restore_state ();

interact with the save/restore that we already emit for epilogues?
Seems to me that it's more-or-less redundant.

An alternative implementation is to simply add a REG_CFA_{ADJUST,DEF}_CFA
note to a blockage insn immediately before the special return.


r~
Richard Henderson - Sept. 23, 2010, 5:31 p.m.
On 09/23/2010 10:28 AM, Richard Henderson wrote:
> An alternative implementation is to simply add a REG_CFA_{ADJUST,DEF}_CFA
> note to a blockage insn immediately before the special return.

... or more properly, now that i think about it, to the insn that
does the cfa adjustment.  I really can't tell what you're wanting
to do with this REG_CFA_TEMP thing...


r~
Richard Henderson - Sept. 23, 2010, 7:10 p.m.
On 09/22/2010 03:03 PM, Ian Lance Taylor wrote:
> This is the fourth of the -fsplit-stack patches.  This patch adds a new
> feature to the DWARF frame note.  If an insn has a new REG_CFA_TEMPORARY
> note, then the CFA is set to the value of the note only for that insn.
> The i386 backend uses this new feature to set the unwind information for
> a special ret instruction that it uses to permit the processor to do
> call/return prediction.

Ok, scratch the previous sub-thread.  I've looked through 6/7 to
see how you're actually using it.

That said, I still don't like it the note.

I think this can be entirely handled within the special return insn.

{
  char *lab;

  lab = dwarf2out_cfi_label (false);
  dwarf2out_frame_debug_remember_state (lab);
  dwarf2out_frame_debug_def_cfa (lab,
    plus_constant (stack_pointer_rtx, UNITS_PER_WORD));
  
  // output ret

  lab = dwarf2out_cfi_label (false);
  dwarf2out_frame_debug_restore_state (lab);

  return "";
}

(with the obvious tweeks to the dwarf2out_frame_debug_* functions.

I also wonder if it wouldn't be better to merge the call and return
insns into the same pattern.  This would guarantee that nothing
sneeks in, when we're playing games with ret_addr+1.


r~
Ian Taylor - Sept. 24, 2010, 5:33 p.m.
Richard Henderson <rth@redhat.com> writes:

> On 09/22/2010 03:03 PM, Ian Lance Taylor wrote:
>> This is the fourth of the -fsplit-stack patches.  This patch adds a new
>> feature to the DWARF frame note.  If an insn has a new REG_CFA_TEMPORARY
>> note, then the CFA is set to the value of the note only for that insn.
>> The i386 backend uses this new feature to set the unwind information for
>> a special ret instruction that it uses to permit the processor to do
>> call/return prediction.
>
> Ok, scratch the previous sub-thread.  I've looked through 6/7 to
> see how you're actually using it.
>
> That said, I still don't like it the note.

Actually, I'm going to withdraw this part of the patch.  It was
necessary at one time to unwind the stack, but now the assembly code
uses slightly tricky unwind info to skip the single return instruction
entirely during the unwind process.  There are still some issues with
the unwind information in 32-bit mode, but they aren't helped by this
patch.  So please consider this patch withdrawn, with the corresponding
changes to patch 5 (x86 backend).


> I also wonder if it wouldn't be better to merge the call and return
> insns into the same pattern.  This would guarantee that nothing
> sneeks in, when we're playing games with ret_addr+1.

I'm a bit reluctant to do that now, since there seem to be several
different variants of the x86 call instruction.  It's certainly a
possibility for later.  There is very little opportunity for anything to
slip in, since the return instruction is unspec_volatile and is
immediately followed by the label which is the target of the branch of
there is enough stack space.

Ian
Richard Henderson - Sept. 24, 2010, 5:54 p.m.
On 09/24/2010 10:33 AM, Ian Lance Taylor wrote:
> Actually, I'm going to withdraw this part of the patch.  It was
> necessary at one time to unwind the stack, but now the assembly code
> uses slightly tricky unwind info to skip the single return instruction
> entirely during the unwind process.

Actually -fasynchronous-unwind-tables needs it.

If the interrupt that samples the call-stack happens at
that lone return insn, we'll have wrong unwind info.

> I'm a bit reluctant to do that now, since there seem to be several
> different variants of the x86 call instruction.  It's certainly a
> possibility for later.

There's really no need to consider all the different rtl call
variants.  There's only one actual direct call insn.

Ignoring the unwind info and return stack popping for clarity,
the insn pattern could be as simple as

(define_insn "morestack"
  [(unspec_v [(match_operand 0)] UNSPECV_MORESTACK)]
  ""
  "call __morestack\;ret"
  )

That said, I wonder about the relative impact of the call-stack
branch (mis)prediction vs the double-branch and extra icache fetches.


r~
Ian Taylor - Sept. 24, 2010, 7:30 p.m.
Richard Henderson <rth@redhat.com> writes:

> On 09/24/2010 10:33 AM, Ian Lance Taylor wrote:
>> Actually, I'm going to withdraw this part of the patch.  It was
>> necessary at one time to unwind the stack, but now the assembly code
>> uses slightly tricky unwind info to skip the single return instruction
>> entirely during the unwind process.
>
> Actually -fasynchronous-unwind-tables needs it.
>
> If the interrupt that samples the call-stack happens at
> that lone return insn, we'll have wrong unwind info.

That can certainly happen and should be fixed at some point, but it is
not a significant issue in practice because we are always within the
unwind-regime of the start of the function at this point, and this gives
us a valid stack unwind.

>
>> I'm a bit reluctant to do that now, since there seem to be several
>> different variants of the x86 call instruction.  It's certainly a
>> possibility for later.
>
> There's really no need to consider all the different rtl call
> variants.  There's only one actual direct call insn.
>
> Ignoring the unwind info and return stack popping for clarity,
> the insn pattern could be as simple as
>
> (define_insn "morestack"
>   [(unspec_v [(match_operand 0)] UNSPECV_MORESTACK)]
>   ""
>   "call __morestack\;ret"
>   )

Oh yeah, good point.

> That said, I wonder about the relative impact of the call-stack
> branch (mis)prediction vs the double-branch and extra icache fetches.

It's not just one call-stack misprediction, it means that the entire
call stack get mispredicted.  I think that will outweigh the extra
icache fetch.

Ian
Richard Henderson - Sept. 24, 2010, 9:10 p.m.
On 09/24/2010 12:30 PM, Ian Lance Taylor wrote:
> That can certainly happen and should be fixed at some point, but it is
> not a significant issue in practice because we are always within the
> unwind-regime of the start of the function at this point, and this gives
> us a valid stack unwind.

Oh yeah, we're still before the prologue, so we should be
exactly at the entry point's unwind info.  Fantastic.


r~

Patch

Index: gcc/reg-notes.def
===================================================================
--- gcc/reg-notes.def	(revision 164490)
+++ gcc/reg-notes.def	(working copy)
@@ -165,6 +165,13 @@  REG_NOTE (CFA_RESTORE)
    to the argument, if it is a MEM, it is ignored.  */
 REG_NOTE (CFA_SET_VDRAP)
 
+/* Temporarily set the CFA for just one insn.  This saves the old
+   state, sets the CFA, and then restores the state after the insn.
+   The pattern for this note is the new CFA value.  This is used by
+   the split stack code to support unwinding through the call into the
+   split stack routine.  */
+REG_NOTE (CFA_TEMPORARY)
+
 /* Indicates that REG holds the exception context for the function.
    This context is shared by inline functions, so the code to acquire
    the real exception context is delayed until after inlining.  */
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	(revision 164490)
+++ gcc/dwarf2out.c	(working copy)
@@ -473,6 +473,7 @@  static void output_call_frame_info (int)
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_frame_debug_remember_state (void);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref);
@@ -2832,6 +2833,17 @@  dwarf2out_frame_debug (rtx insn, bool af
 	handled_one = true;
 	break;
 
+      case REG_CFA_TEMPORARY:
+	if (!after_p)
+	  {
+	    dwarf2out_frame_debug_remember_state ();
+	    dwarf2out_frame_debug_def_cfa (XEXP (note, 0), label);
+	  }
+	else
+	  dwarf2out_frame_debug_restore_state ();
+	handled_one = true;
+	break;
+
       default:
 	break;
       }
@@ -2924,9 +2936,17 @@  dwarf2out_cfi_begin_epilogue (rtx insn)
     }
   emit_note_before (NOTE_INSN_CFA_RESTORE_STATE, i);
 
+  dwarf2out_frame_debug_remember_state ();
+}
+
+/* Remember the current state.  */
+
+static void
+dwarf2out_frame_debug_remember_state (void)
+{
   emit_cfa_remember = true;
 
-  /* And emulate the state save.  */
+  /* Emulate the state save.  */
   gcc_assert (!cfa_remember.in_use);
   cfa_remember = cfa;
   cfa_remember.in_use = 1;