Patchwork [RS6000] asynch exceptions and unwind info

login
register
mail settings
Submitter Alan Modra
Date July 27, 2011, 5:30 a.m.
Message ID <20110727053045.GO1081@bubble.grove.modra.org>
Download mbox | patch
Permalink /patch/106982/
State New
Headers show

Comments

Alan Modra - July 27, 2011, 5:30 a.m.
Hi David,
  I've been looking into what we need to do to support unwinding from
async signal handlers.  I've implemented unwind info generation for
.glink in the linker, but to keep the ppc64 .glink unwind info simple
I've assumed that frob_update_context is still used.

We still have some difficulties related to r2 tracking on
ppc64. frob_update_context doesn't quite do the right thing for
async unwinding.  A typical (no-r11) plt call stub looks like

 addis r12,2,off@ha
 std 2,40(1)
 ld 11,off@l(12)
 mtctr 11
 ld 2,off+8@l(12)
 bctr

or, when the offset from r2 to the function descriptor is small

 std 2,40(1)
 ld 11,off(2)
 mtctr 11
 ld 2,off+8(2)
 bctr

Now if we're stopped before the save of r2 we obviously don't want the
unwinder to restore r2 from 40(1), but that's exactly what the current
unwinder does.

Also, there is a one insn window where frob_update_context may do the
wrong thing for gcc generated calls via function pointer, which
typically looks like

 ld 0,0(r)
 std 2,40(1)
 mtctr 0
 ld 2,8(r)
 bctrl
 ld 2,40(1)

Here, if we are stopped after the "ld 2,8(r)" then r2 needs to be
restored from 40(1).

The following patch fixes these two issues.  Ideally what I'd like to
do is have ld and gcc emit accurate r2 tracking unwind info and
dispense with hacks like frob_update_context.  If ld did emit accurate
unwind info for .glink, then the justification for frob_update_context
disappears.  The difficulty then is backwards compatibility.  You'd
need a way for the gcc unwinder to handle a mix of old code (that
needs frob_update_context) with new code (that doesn't).  One way to
accomplish this would be to set a dummy reg with initial CIE dwarf
instructions, then test this reg in frob_update_context.

Bootstrapped and regression tested powerpc64-linux.

	* config/rs6000/linux-unwind.h (frob_update_context <__powerpc64__>):
	Leave r2 REG_UNSAVED if stopped on the instruction that saves r2
	in a plt call stub.  Do restore r2 if stopped on bctrl.
David Edelsohn - July 27, 2011, 11:38 p.m.
On Wed, Jul 27, 2011 at 1:30 AM, Alan Modra <amodra@gmail.com> wrote:

>        * config/rs6000/linux-unwind.h (frob_update_context <__powerpc64__>):
>        Leave r2 REG_UNSAVED if stopped on the instruction that saves r2
>        in a plt call stub.  Do restore r2 if stopped on bctrl.

Okay.

Thanks, David
Alan Modra - July 28, 2011, 7:27 a.m.
On Wed, Jul 27, 2011 at 03:00:45PM +0930, Alan Modra wrote:
> Ideally what I'd like to
> do is have ld and gcc emit accurate r2 tracking unwind info and
> dispense with hacks like frob_update_context.  If ld did emit accurate
> unwind info for .glink, then the justification for frob_update_context
> disappears.

For the record, this statement of mine doesn't make sense.  A .glink
stub doesn't make a frame, so a backtrace won't normally pass through a
stub, thus having accurate unwind info for .glink doesn't help at all.

ld would need to insert unwind info for r2 on the call, but that
involves editing .eh_frame and in any case isn't accurate since
the r2 save doesn't happen until one or two instructions after the
call, in the stub.  I think we are stuck with frob_update_context.
Richard Henderson - July 28, 2011, 6:49 p.m.
On 07/28/2011 12:27 AM, Alan Modra wrote:
> On Wed, Jul 27, 2011 at 03:00:45PM +0930, Alan Modra wrote:
>> Ideally what I'd like to
>> do is have ld and gcc emit accurate r2 tracking unwind info and
>> dispense with hacks like frob_update_context.  If ld did emit accurate
>> unwind info for .glink, then the justification for frob_update_context
>> disappears.
> 
> For the record, this statement of mine doesn't make sense.  A .glink
> stub doesn't make a frame, so a backtrace won't normally pass through a
> stub, thus having accurate unwind info for .glink doesn't help at all.

It does, for the duration of the stub.

The whole problem is that toc pointer copy in 40(1) is only valid
during indirect call sequences, and iff ld inserted a stub?  I.e.
direct calls between functions that share toc pointers never save
the copy?

Would it make sense, if a function has any indirect call, to move
the toc pointer save into the prologue?  You'd get to avoid that
store all the time.  Of course you'd not be able to sink the load
after the call, but it might still be a win.  And in that special
case you can annotate the r2 save slot just once, correctly.

For functions that do not contain an indirect function call, I
don't believe that there's a any way to use DW_CFA_offset that
is always correct.

One could, however, move the code in frob_update_context into a
(series of) DW_CFA_val_expression's.

  DW_CFA_val_expression
    DW_OP_reg2		// Default to the value currently in R2
    DW_OP_regx LR	// Test the insn following the call, as per frob_update_context
    DW_OP_deref_size 4
    DW_OP_const4u 0xE8410028
    DW_OP_ne
    DW_OP_bra L1
    DW_OP_drop		// Could be omitted, given that we only examine top-of-stack at the end
    DW_OP_breg1 40	// Pull the value from *(R1+40)
    DW_OP_deref
  L1:

This version could appear in the CIE.  You'd have to adjust it
once LR gets saved to the stack, and R2 isn't itself being saved
as per above.

There isn't currently a hook in dwarf2cfi to add extra stuff to
the CIE program, but that wouldn't be hard to add.  The version
that gets emitted after LR is saved would need a new note as well.
But it all seems fairly tractable to actually implement, if we
think it'll actually solve the problem.


r~
David Edelsohn - July 28, 2011, 7:02 p.m.
On Thu, Jul 28, 2011 at 2:49 PM, Richard Henderson <rth@redhat.com> wrote:

> The whole problem is that toc pointer copy in 40(1) is only valid
> during indirect call sequences, and iff ld inserted a stub?  I.e.
> direct calls between functions that share toc pointers never save
> the copy?
>
> Would it make sense, if a function has any indirect call, to move
> the toc pointer save into the prologue?  You'd get to avoid that
> store all the time.  Of course you'd not be able to sink the load
> after the call, but it might still be a win.  And in that special
> case you can annotate the r2 save slot just once, correctly.

Michael Meissner recently did move R2 save into the prologue, under
certain circumstances.  See TARGET_SAVE_TOC_INDIRECT.  Limitations
include alloca (unless one re-copies the R2.  Mike also encountered
some problems with EH, which may be related to this discussion.

The other problem is hoisting the store into the prologue is not
always profitable for performance.  It should be better once shrink
wrapping is implemented.  Currently the PPC ABI may perform a lot of
stores in the prologue if the function *may* make a call.  R2 adds yet
another store to the common path.

- David
Richard Henderson - July 28, 2011, 7:09 p.m.
On 07/28/2011 12:02 PM, David Edelsohn wrote:
> The other problem is hoisting the store into the prologue is not
> always profitable for performance.  It should be better once shrink
> wrapping is implemented.  Currently the PPC ABI may perform a lot of
> stores in the prologue if the function *may* make a call.  R2 adds yet
> another store to the common path.

Well, even if we're not able to hoist the R2 store, we may be able
to simply add REG_CFA_OFFSET and REG_CFA_RESTORE notes to the insns
in the stream.


r~
Alan Modra - July 29, 2011, 4:25 a.m.
On Thu, Jul 28, 2011 at 12:09:51PM -0700, Richard Henderson wrote:
> Well, even if we're not able to hoist the R2 store, we may be able
> to simply add REG_CFA_OFFSET and REG_CFA_RESTORE notes to the insns
> in the stream.

You'd need to mark every non-local call with something that says
R2 may be saved, effectively duplicating md_frob_update in dwarf.
I guess that is possible even without extending our eh encoding, but
each call would have at least 6 bytes added to eh_frame:
   DW_CFA_expression, 2, 3, DW_OP_skip, offset_to_r2_prog
and you'd need to emit multiple copies of "r2_prog" for functions that
have a lot of calls, since the offset is limited to +/-32k.  I think
that would inflate the size of .eh_frame too much, and slow down
handling of exceptions dramatically.

Patch

Index: libgcc/config/rs6000/linux-unwind.h
===================================================================
--- libgcc/config/rs6000/linux-unwind.h	(revision 176780)
+++ libgcc/config/rs6000/linux-unwind.h	(working copy)
@@ -346,10 +346,28 @@  frob_update_context (struct _Unwind_Cont
 	 figure out if it was saved.  The big problem here is that the
 	 code that does the save/restore is generated by the linker, so
 	 we have no good way to determine at compile time what to do.  */
-      unsigned int *insn
-	= (unsigned int *) _Unwind_GetGR (context, R_LR);
-      if (insn && *insn == 0xE8410028)
-	_Unwind_SetGRPtr (context, 2, context->cfa + 40);
+      if (pc[0] == 0xF8410028
+	  || ((pc[0] & 0xFFFF0000) == 0x3D820000
+	      && pc[1] == 0xF8410028))
+	{
+	  /* We are in a plt call stub or r2 adjusting long branch stub,
+	     before r2 has been saved.  Keep REG_UNSAVED.  */
+	}
+      else if (pc[0] == 0x4E800421
+	       && pc[1] == 0xE8410028)
+	{
+	  /* We are at the bctrl instruction in a call via function
+	     pointer.  gcc always emits the load of the new r2 just
+	     before the bctrl.  */
+	  _Unwind_SetGRPtr (context, 2, context->cfa + 40);
+	}
+      else
+	{
+	  unsigned int *insn
+	    = (unsigned int *) _Unwind_GetGR (context, R_LR);
+	  if (insn && *insn == 0xE8410028)
+	    _Unwind_SetGRPtr (context, 2, context->cfa + 40);
+	}
     }
 #endif
 }