diff mbox

[RS6000] Stack tie fix.

Message ID 20120403084021.GK12569@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra April 3, 2012, 8:40 a.m. UTC
This patch merges the rs6000 stack_tie and frame_tie rtl, so that we
now should use a tie insn that mentions all base regs involved in
register saves and restores.  That should avoid any alias analysis
problems eg. http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01156.html.
I chose to put the regs of interest on the destination of the fake
set, rather than in the source as is currently done, because I figure
that relying on the compiler to not reorder reads is fragile.  Here's
an example of the new stack tie:

(set (mem:BLK (unspec:SI [(reg:SI 1) (reg:SI 12)] UNSPEC_TIE))
     (const_int 0))

Some notes:
- I tried putting the mem on both source and destination of the set,
  but that runs afoul of rtl dead code elimination.
- The rs6000_emit_epilogue places that called gen_frame_tie and
  gen_stack_tie both used alias set 0 mems.  I believe this was
  unnecessarily restrictive.
- CODE_FOR_stack_tie is mentioned in rs6000.c but not
  CODE_FOR_frame_tie.  Removing frame_tie fixes this sched bug too.

Bootstrapped and regression tested powerpc-linux with usual -O2
BOOT_CFLAGS and also -Os, and testcases from pr44199, pr30282, pr52828,
the url above and http://gcc.gnu.org/ml/gcc/2011-03/msg00123.html
examined for sanity.  OK to apply?

	PR target/52828
	* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with
	tie regs on destination of set.  Delete forward declaration.
	(rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls.
	(rs6000_emit_prologue): Likewise.
	(rs6000_emit_epilogue): Likewise.  Use in place of gen_frame_tie
	and gen_stack_tie.
	* config/rs6000/predicates.md (tie_operand): New.
	* config/rs6000/rs6000.md (restore_stack_block): Generate new
	stack tie rtl.
	(restore_stack_nonlocal): Likewise.
	(stack_tie): Update.
	(frame_tie): Delete.

Comments

Olivier Hainque April 3, 2012, 2:34 p.m. UTC | #1
Hello Alan,

Thanks a lot for following up on this one. Coincidentally, I was just
about to submit the alternate approach we had discussed about, after
David's comment at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html.

We have experienced with it for a while on our gcc-4.5 series of
compilers and we are working on a transition to gcc 4.7.

This is of course a much heavier hammer so it would be nice if we can
indeed have a subtler way out :-)

Olivier

On Apr 3, 2012, at 10:40 , Alan Modra wrote:

> This patch merges the rs6000 stack_tie and frame_tie rtl, so that we
> now should use a tie insn that mentions all base regs involved in
> register saves and restores.  That should avoid any alias analysis
> problems eg. http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01156.html.
> I chose to put the regs of interest on the destination of the fake
> set, rather than in the source as is currently done, because I figure
> that relying on the compiler to not reorder reads is fragile.  Here's
> an example of the new stack tie:
> 
> (set (mem:BLK (unspec:SI [(reg:SI 1) (reg:SI 12)] UNSPEC_TIE))
>     (const_int 0))
> 
> Some notes:
> - I tried putting the mem on both source and destination of the set,
>  but that runs afoul of rtl dead code elimination.
> - The rs6000_emit_epilogue places that called gen_frame_tie and
>  gen_stack_tie both used alias set 0 mems.  I believe this was
>  unnecessarily restrictive.
> - CODE_FOR_stack_tie is mentioned in rs6000.c but not
>  CODE_FOR_frame_tie.  Removing frame_tie fixes this sched bug too.
> 
> Bootstrapped and regression tested powerpc-linux with usual -O2
> BOOT_CFLAGS and also -Os, and testcases from pr44199, pr30282, pr52828,
> the url above and http://gcc.gnu.org/ml/gcc/2011-03/msg00123.html
> examined for sanity.  OK to apply?
> 
> 	PR target/52828
> 	* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with
> 	tie regs on destination of set.  Delete forward declaration.
> 	(rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls.
> 	(rs6000_emit_prologue): Likewise.
> 	(rs6000_emit_epilogue): Likewise.  Use in place of gen_frame_tie
> 	and gen_stack_tie.
> 	* config/rs6000/predicates.md (tie_operand): New.
> 	* config/rs6000/rs6000.md (restore_stack_block): Generate new
> 	stack tie rtl.
> 	(restore_stack_nonlocal): Likewise.
> 	(stack_tie): Update.
> 	(frame_tie): Delete.
> 
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c	(revision 185830)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -925,7 +925,6 @@ static const char *rs6000_invalid_within_doloop (c
> static bool rs6000_legitimate_address_p (enum machine_mode, rtx, bool);
> static bool rs6000_debug_legitimate_address_p (enum machine_mode, rtx, bool);
> static rtx rs6000_generate_compare (rtx, enum machine_mode);
> -static void rs6000_emit_stack_tie (void);
> static bool spe_func_has_64bit_regs_p (void);
> static rtx gen_frame_mem_offset (enum machine_mode, rtx, int);
> static unsigned rs6000_hash_constant (rtx);
> @@ -18517,12 +18516,28 @@ rs6000_aix_asm_output_dwarf_table_ref (char * fram
>    and the change to the stack pointer.  */
> 
> static void
> -rs6000_emit_stack_tie (void)
> +rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
> {
> -  rtx mem = gen_frame_mem (BLKmode,
> -			   gen_rtx_REG (Pmode, STACK_POINTER_REGNUM));
> +  rtx u;
> +  rtvec p;
> +  int i;
> 
> -  emit_insn (gen_stack_tie (mem));
> +  if (REGNO (fp) == STACK_POINTER_REGNUM
> +      || (hard_frame_needed
> +	  && REGNO (fp) == HARD_FRAME_POINTER_REGNUM))
> +    fp = NULL_RTX;
> +
> +  p = rtvec_alloc (1 + hard_frame_needed + (fp != NULL_RTX));
> +
> +  i = 0;
> +  RTVEC_ELT (p, i++) = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
> +  if (hard_frame_needed)
> +    RTVEC_ELT (p, i++) = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
> +  if (fp != NULL_RTX)
> +    RTVEC_ELT (p, i++) = fp;
> +  u = gen_rtx_UNSPEC (Pmode, p, UNSPEC_TIE);
> +
> +  emit_insn (gen_stack_tie (gen_frame_mem (BLKmode, u)));
> }
> 
> /* Emit the correct code for allocating stack space, as insns.
> @@ -19142,7 +19157,7 @@ rs6000_emit_stack_reset (rs6000_stack_t *info,
>       || (TARGET_SPE_ABI
> 	  && info->spe_64bit_regs_used != 0
> 	  && info->first_gp_reg_save != 32))
> -    rs6000_emit_stack_tie ();
> +    rs6000_emit_stack_tie (frame_reg_rtx, frame_pointer_needed);
> 
>   if (frame_reg_rtx != sp_reg_rtx)
>     {
> @@ -19362,7 +19377,7 @@ rs6000_emit_prologue (void)
> 	}
>       rs6000_emit_allocate_stack (info->total_size, copy_reg);
>       if (frame_reg_rtx != sp_reg_rtx)
> -	rs6000_emit_stack_tie ();
> +	rs6000_emit_stack_tie (frame_reg_rtx, false);
>     }
> 
>   /* Handle world saves specially here.  */
> @@ -19866,7 +19881,7 @@ rs6000_emit_prologue (void)
> 	sp_offset = info->total_size;
>       rs6000_emit_allocate_stack (info->total_size, copy_reg);
>       if (frame_reg_rtx != sp_reg_rtx)
> -	rs6000_emit_stack_tie ();
> +	rs6000_emit_stack_tie (frame_reg_rtx, false);
>     }
> 
>   /* Set frame pointer, if needed.  */
> @@ -20437,13 +20452,7 @@ rs6000_emit_epilogue (int sibcall)
>       /* Prevent reordering memory accesses against stack pointer restore.  */
>       else if (cfun->calls_alloca
> 	       || offset_below_red_zone_p (-info->total_size))
> -	{
> -	  rtx mem1 = gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx);
> -	  rtx mem2 = gen_rtx_MEM (BLKmode, sp_reg_rtx);
> -	  MEM_NOTRAP_P (mem1) = 1;
> -	  MEM_NOTRAP_P (mem2) = 1;
> -	  emit_insn (gen_frame_tie (mem1, mem2));
> -	}
> +	rs6000_emit_stack_tie (frame_reg_rtx, true);
> 
>       insn = emit_insn (gen_add3_insn (frame_reg_rtx, hard_frame_pointer_rtx,
> 				       GEN_INT (info->total_size)));
> @@ -20456,11 +20465,7 @@ rs6000_emit_epilogue (int sibcall)
>       /* Prevent reordering memory accesses against stack pointer restore.  */
>       if (cfun->calls_alloca
> 	  || offset_below_red_zone_p (-info->total_size))
> -	{
> -	  rtx mem = gen_rtx_MEM (BLKmode, sp_reg_rtx);
> -	  MEM_NOTRAP_P (mem) = 1;
> -	  emit_insn (gen_stack_tie (mem));
> -	}
> +	rs6000_emit_stack_tie (frame_reg_rtx, false);
>       insn = emit_insn (gen_add3_insn (sp_reg_rtx, sp_reg_rtx,
> 				       GEN_INT (info->total_size)));
>       sp_offset = 0;
> Index: gcc/config/rs6000/predicates.md
> ===================================================================
> --- gcc/config/rs6000/predicates.md	(revision 185830)
> +++ gcc/config/rs6000/predicates.md	(working copy)
> @@ -1451,3 +1451,11 @@
> 
>   return 1;
> })
> +
> +;; Return 1 if OP is a stack tie operand.
> +(define_predicate "tie_operand"
> +  (match_code "mem")
> +{
> +  return (GET_CODE (XEXP (op, 0)) == UNSPEC
> +	  && XINT (XEXP (op, 0), 1) == UNSPEC_TIE);
> +})
> Index: gcc/config/rs6000/rs6000.md
> ===================================================================
> --- gcc/config/rs6000/rs6000.md	(revision 185830)
> +++ gcc/config/rs6000/rs6000.md	(working copy)
> @@ -11989,17 +11989,20 @@
> (define_expand "restore_stack_block"
>   [(set (match_dup 2) (match_dup 3))
>    (set (match_dup 4) (match_dup 2))
> -   (set (match_dup 5) (unspec:BLK [(match_dup 5)] UNSPEC_TIE))
> +   (set (match_dup 5) (const_int 0))
>    (set (match_operand 0 "register_operand" "")
> 	(match_operand 1 "register_operand" ""))]
>   ""
>   "
> {
> +  rtx u;
> +
>   operands[1] = force_reg (Pmode, operands[1]);
>   operands[2] = gen_reg_rtx (Pmode);
>   operands[3] = gen_frame_mem (Pmode, operands[0]);
>   operands[4] = gen_frame_mem (Pmode, operands[1]);
> -  operands[5] = gen_frame_mem (BLKmode, operands[0]);
> +  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
> +  operands[5] = gen_frame_mem (BLKmode, u);
> }")
> 
> (define_expand "save_stack_nonlocal"
> @@ -12022,12 +12025,13 @@
>   [(set (match_dup 2) (match_operand 1 "memory_operand" ""))
>    (set (match_dup 3) (match_dup 4))
>    (set (match_dup 5) (match_dup 2))
> -   (set (match_dup 6) (unspec:BLK [(match_dup 6)] UNSPEC_TIE))
> +   (set (match_dup 6) (const_int 0))
>    (set (match_operand 0 "register_operand" "") (match_dup 3))]
>   ""
>   "
> {
>   int units_per_word = (TARGET_32BIT) ? 4 : 8;
> +  rtx u;
> 
>   /* Restore the backchain from the first word, sp from the second.  */
>   operands[2] = gen_reg_rtx (Pmode);
> @@ -12035,7 +12039,8 @@
>   operands[1] = adjust_address_nv (operands[1], Pmode, 0);
>   operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word);
>   operands[5] = gen_frame_mem (Pmode, operands[3]);
> -  operands[6] = gen_frame_mem (BLKmode, operands[0]);
> +  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
> +  operands[6] = gen_frame_mem (BLKmode, u);
> }")
> 
> ;; TOC register handling.
> @@ -15899,25 +15904,15 @@
>   [(set_attr "type" "branch")
>    (set_attr "length" "4")])
> 
> -; These are to explain that changes to the stack pointer should
> -; not be moved over stores to stack memory.
> +; This is to explain that changes to the stack pointer should
> +; not be moved over loads from or stores to stack memory.
> (define_insn "stack_tie"
> -  [(set (match_operand:BLK 0 "memory_operand" "+m")
> -        (unspec:BLK [(match_dup 0)] UNSPEC_TIE))]
> +  [(set (match_operand:BLK 0 "tie_operand" "")
> +	(const_int 0))]
>   ""
>   ""
>   [(set_attr "length" "0")])
> 
> -; Like stack_tie, but depend on both fp and sp based memory.
> -(define_insn "frame_tie"
> -  [(set (match_operand:BLK 0 "memory_operand" "+m")
> -	(unspec:BLK [(match_dup 0)
> -		     (match_operand:BLK 1 "memory_operand" "m")] UNSPEC_TIE))]
> -  ""
> -  ""
> -  [(set_attr "length" "0")])
> -
> -
> (define_expand "epilogue"
>   [(use (const_int 0))]
>   ""
> 
> -- 
> Alan Modra
> Australia Development Lab, IBM
>
Olivier Hainque April 3, 2012, 2:55 p.m. UTC | #2
On Apr 3, 2012, at 16:34 , Olivier Hainque wrote:
> Thanks a lot for following up on this one. Coincidentally, I was just
> about to submit the alternate approach we had discussed about, after
> David's comment at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html.

> This is of course a much heavier hammer so it would be nice if we can
> indeed have a subtler way out :-)

 I realized that this last sentence is ambiguous ...

 To clarify: the heavier approach is the one I was about to submit
 (minor variation of Joseph's proposal in the thread just referenced),
 and the subtler way out is the one you are proposing here.

 Olivier
David Edelsohn April 3, 2012, 5:05 p.m. UTC | #3
On Tue, Apr 3, 2012 at 10:55 AM, Olivier Hainque <hainque@adacore.com> wrote:
>
> On Apr 3, 2012, at 16:34 , Olivier Hainque wrote:
>> Thanks a lot for following up on this one. Coincidentally, I was just
>> about to submit the alternate approach we had discussed about, after
>> David's comment at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html.
>
>> This is of course a much heavier hammer so it would be nice if we can
>> indeed have a subtler way out :-)
>
>  To clarify: the heavier approach is the one I was about to submit
>  (minor variation of Joseph's proposal in the thread just referenced),
>  and the subtler way out is the one you are proposing here.

We can give Alan's patch a try.  I'm not sure if it is sufficient
given the experience of IBM's XL compiler.  I also would rather not
use the heavier hammer, but I don't want to leave a latent bug.

Thanks, David
Alan Modra April 4, 2012, 1:22 a.m. UTC | #4
On Tue, Apr 03, 2012 at 01:05:08PM -0400, David Edelsohn wrote:
> On Tue, Apr 3, 2012 at 10:55 AM, Olivier Hainque <hainque@adacore.com> wrote:
> >
> > On Apr 3, 2012, at 16:34 , Olivier Hainque wrote:
> >> Thanks a lot for following up on this one. Coincidentally, I was just
> >> about to submit the alternate approach we had discussed about, after
> >> David's comment at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html.
> >
> >> This is of course a much heavier hammer so it would be nice if we can
> >> indeed have a subtler way out :-)
> >
> >  To clarify: the heavier approach is the one I was about to submit
> >  (minor variation of Joseph's proposal in the thread just referenced),
> >  and the subtler way out is the one you are proposing here.
> 
> We can give Alan's patch a try.  I'm not sure if it is sufficient
> given the experience of IBM's XL compiler.  I also would rather not
> use the heavier hammer, but I don't want to leave a latent bug.

I'll see whether my approach fixes pr30282 for gcc-4.4 which has known
deficiencies in alias analysis.  Olivier, would you be kind enough to
backport and test against other versions of gcc that needed your
bigger hammer?
Olivier Hainque April 5, 2012, 10:36 a.m. UTC | #5
Hello Alan,

On Apr 4, 2012, at 03:22 , Alan Modra wrote:
> I'll see whether my approach fixes pr30282 for gcc-4.4 which has known
> deficiencies in alias analysis.  Olivier, would you be kind enough to
> backport and test against other versions of gcc that needed your
> bigger hammer?

 Sure. I can debug the paths taken for the testcases where we had
 observed problems and see. I'll see if I can do a few tests comparing
 generated code on some source base. The problems are hard to observe
 at runtime because they are very timing/event/environment dependant.

 I still feel a bit unclear/concerned on a couple of aspects. One is:
 
 There are lots of places in the emit_epilogue code that assign
 frame_reg_rtx with different possible values, (hard_fp, sp, r11)
 before we first get to points where we emit ties. There are also
 multiple places where we do emit ties.

 It is not at all obvious to me that the all places where we emit ties
 have an accurate perception of what frame_reg's were possibly used before.

 IOW, I am not 100% convinced that we cannot have a bad case where
 we emit a tie that misses a reg reference. The various paths are all
 controlled by intricate conditions, so getting that 100% conviction
 (at least on paper) isn't easy to me.

 Is it clearer to you ?

 FWIW, I had made an experiment at trying to extract subfunctions,
 which might help such reasoning.  Set of patches attached. This doesn't
 apply to the current mainline, would need to refined, and we probably
 couldn't/shouldn't put something like this on the path to the resolution
 of our current issue, so this is JIC you are curious.
Alan Modra April 5, 2012, 2:03 p.m. UTC | #6
Hi Olivier,

On Thu, Apr 05, 2012 at 12:36:01PM +0200, Olivier Hainque wrote:
> On Apr 4, 2012, at 03:22 , Alan Modra wrote:
> > I'll see whether my approach fixes pr30282 for gcc-4.4 which has known
> > deficiencies in alias analysis.  Olivier, would you be kind enough to
> > backport and test against other versions of gcc that needed your
> > bigger hammer?

Well it turns out that gcc-4.4 still gets this testcase from pr30282
wrong.

int find_num(int i)
{
    int arr[5] = {0, 1, 2, 3, 4};
    return arr[i];
}

The read from arr[i] is scheduled past the frame teardown.

[snip]
>  There are lots of places in the emit_epilogue code that assign
>  frame_reg_rtx with different possible values, (hard_fp, sp, r11)

r12 too

>  before we first get to points where we emit ties. There are also
>  multiple places where we do emit ties.
> 
>  It is not at all obvious to me that the all places where we emit ties
>  have an accurate perception of what frame_reg's were possibly used before.
> 
>  IOW, I am not 100% convinced that we cannot have a bad case where
>  we emit a tie that misses a reg reference. The various paths are all
>  controlled by intricate conditions, so getting that 100% conviction
>  (at least on paper) isn't easy to me.
> 
>  Is it clearer to you ?

I spent quite some time looking at it. ;)  Have you spotted an error
somewhere, apart from spe_save_area_ptr not being mentioned in the
stack tie?  I left that one for a later fix since the SPE reg saves
use alias set 0 mems (which is another bug).

Hmm, I see I accidentally left out an assert from the stack tie
patch.  This one may make you feel a little better.  :)

  /* Update stack and set back pointer unless this is V.4,
     for which it was done previously.  */
  if (!WORLD_SAVE_P (info) && info->push_p
      && !(DEFAULT_ABI == ABI_V4 || crtl->calls_eh_return))
    {
      rtx copy_reg = NULL;

      /* If using some other frame reg previously, then we ought to
	 ensure it is mentioned in the stack tie emitted below.  */
      gcc_checking_assert (REGNO (frame_reg_rtx) == 1
			   || REGNO (frame_reg_rtx) == 12);


>  FWIW, I had made an experiment at trying to extract subfunctions,
>  which might help such reasoning.  Set of patches attached. This doesn't
>  apply to the current mainline, would need to refined, and we probably
>  couldn't/shouldn't put something like this on the path to the resolution
>  of our current issue, so this is JIC you are curious.

I think this is worthwhile doing, but more important to try to make
the logic simpler.  For example, this

  /* If we need to save CR, put it into r12 or r11.  */
  if (!WORLD_SAVE_P (info) && info->cr_save_p && frame_reg_rtx != frame_ptr_rtx)
    {
      rtx set;

      cr_save_rtx
	= gen_rtx_REG (SImode, DEFAULT_ABI == ABI_AIX && !saving_GPRs_inline
		       ? 11 : 12);

is better written as

  /* If we need to save CR, put it into r12 or r11.  */
  cr_save_regno = DEFAULT_ABI == ABI_AIX && !saving_GPRs_inline ? 11 : 12;
  if (!WORLD_SAVE_P (info)
      && info->cr_save_p
      && REGNO (frame_reg_rtx) != cr_save_regno)
    {
      rtx set;

      cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);

This way it's obvious why there is a test of frame_reg_rtx, and that
the test is correct.  ie. you aren't clobbering frame_reg_rtx with cr.
Olivier Hainque April 5, 2012, 5:23 p.m. UTC | #7
On Apr 5, 2012, at 16:03 , Alan Modra wrote:

> Well it turns out that gcc-4.4 still gets this testcase from pr30282
> wrong.

 Hmph :-(

>> There are lots of places in the emit_epilogue code that assign
>> frame_reg_rtx with different possible values, (hard_fp, sp, r11)
> 
> r12 too

 Right ... 

>> It is not at all obvious to me that the all places where we emit ties
>> have an accurate perception of what frame_reg's were possibly used before.
>> Is it clearer to you ?
> 
> I spent quite some time looking at it. ;)

 Oh, sure, that's why I'm asking :)

> Have you spotted an error
> somewhere, apart from spe_save_area_ptr not being mentioned in the
> stack tie?

 No, not really ... (more below)

> Hmm, I see I accidentally left out an assert from the stack tie
> patch.  This one may make you feel a little better.  :)
> 
>  /* Update stack and set back pointer unless this is V.4,
>     for which it was done previously.  */

>      /* If using some other frame reg previously, then we ought to
> 	 ensure it is mentioned in the stack tie emitted below.  */
>      gcc_checking_assert (REGNO (frame_reg_rtx) == 1
> 			   || REGNO (frame_reg_rtx) == 12);

 My concern (apart from possible remaining aliasing issues) was exactly
 what the comment above expresses, but right away I don't see how the assert
 does the check expressed by the comment, and there are other places where
 we emit ties.

 I need to review the function with your changes applied in greater detail.
 
 IIUC (please correct me if I'm wrong), every time we need to emit a tie, we
 must ensure that it references all the base regs that were used before to
 restore regs from the frame (to prevent the sp adjustment past the tie from
 moving prior to any of these accesses)
 
 Since we have several possible uses of different registers ahead, controlled
 by intricate conditions, I would have thought we'd need to maintain a list
 of these registers, to which we add every time we use a new one to access, and
 which we'd use to feed the tie references.

 If we don't do that, I still find it difficult to see that the ties alway do
 reference the proper set of regs (all those used to access the frame earlier).

 It might just be that there's part of the logic I still haven't grasped, so ...

>> FWIW, I had made an experiment at trying to extract subfunctions,
>> which might help such reasoning.

> I think this is worthwhile doing, but more important to try to make
> the logic simpler.
[...]

Entirely agreed! The exercise was just aimed at helping me understand
some of the logic :-)

Thanks a lot for your feedback, much appreciated,

Olivier
Alan Modra April 6, 2012, 12:32 a.m. UTC | #8
On Thu, Apr 05, 2012 at 07:23:20PM +0200, Olivier Hainque wrote:
>  IIUC (please correct me if I'm wrong), every time we need to emit a tie, we
>  must ensure that it references all the base regs that were used before to
>  restore regs from the frame (to prevent the sp adjustment past the tie from
>  moving prior to any of these accesses)

The minimal requirement is:  In the prologue, the next frame write
must be prevented from moving before the stack adjust.  In the
epilogue, the previous frame read must be prevented from moving after
the stack adjust.  If the adjacent write/read uses r1 as a base reg,
then we don't need a stack tie at all.

Writes/reads further away won't be reordered over the adjacent
write/read.  At least, I've never seen gcc do so, even with older
versions known to have bugs in aliasing analysis.  Every single case
I've seen of improper reordering had a stack tie that didn't mention
the base reg used by the adjacent write/read (or no tie).

(Hah!  In writing this, I remember why I took out that assert.  What
happens prior to the tie in the prologue is of no concern.)
Olivier Hainque April 6, 2012, 3:25 p.m. UTC | #9
Hello Alan,

On Apr 6, 2012, at 02:32 , Alan Modra wrote:
[...]
> In the epilogue, the previous frame read must be prevented from moving after
> the stack adjust.  If the adjacent write/read uses r1 as a base reg,
> then we don't need a stack tie at all.

Right

> Writes/reads further away won't be reordered over the adjacent
> write/read.  At least, I've never seen gcc do so, even with older
> versions known to have bugs in aliasing analysis.

 Ah, I hadn't understood that there was this implicit assumption.
 I understand better that part of the logic then. (would be worth
 a comment IMHO) Thanks!

> (Hah!  In writing this, I remember why I took out that assert.  What
> happens prior to the tie in the prologue is of no concern.)

 :-)

 I have been able to verify that your patch helps our previously
 failing case, starting from the svn rev and failure mode analyzed per 
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01156.html.

 We get through the same compiler circuits, down to base_alias_check
 now with

  x = (unspec:SI [
        (reg/f:SI 1 1)
        (reg/f:SI 31 31)
        (reg:SI 11 11)
    ] UNSPEC_TIE)

 instead of (reg:SI 1).

 x_base is now null and such that we rapidly conclude that "we don't
 know anything about aliasing", so the outer mem accesses are declared
 conflicting.

 Have you discovered where the problem you are still observing
 is coming from (with another case on 4.4) ? I was planning to look
 into it and realized that maybe you were already doing so.

 Thanks much for your feedback,

 Olivier
Alan Modra April 11, 2012, 4:15 a.m. UTC | #10
On Fri, Apr 06, 2012 at 05:25:15PM +0200, Olivier Hainque wrote:
>  Have you discovered where the problem you are still observing
>  is coming from (with another case on 4.4) ? I was planning to look
>  into it and realized that maybe you were already doing so.

In gcc-4.4, alias.c:write_dependence_p has

  if (DIFFERENT_ALIAS_SETS_P (x, mem))
    return 0;

immediately after the tests for (mem:BLK (scratch)) and
ALIAS_SET_MEMORY_BARRIER.  On find_num.c the read from the stack arr[]
is alias set 3, the stack tie is alias set 2, so they are seen as not
conflicting.

For mainline we have the same alias sets but with the addresses
involved, (unspec [reg 1] UNSPEC_TIE) and (plus (reg 9) (reg 3)), we
run all the way down to rtx_refs_may_alias_p where ao_ref_from_mem for
the stack tie returns false.  So anti_dependence returns true.

Hmm, the fact that alias analysis on mainline treats my fancy barriers
like this says I may as well have not bothered.  (mem:BLK (scratch))
performs identically at the moment.
Richard Biener April 11, 2012, 9 a.m. UTC | #11
On Wed, Apr 11, 2012 at 6:15 AM, Alan Modra <amodra@gmail.com> wrote:
> On Fri, Apr 06, 2012 at 05:25:15PM +0200, Olivier Hainque wrote:
>>  Have you discovered where the problem you are still observing
>>  is coming from (with another case on 4.4) ? I was planning to look
>>  into it and realized that maybe you were already doing so.
>
> In gcc-4.4, alias.c:write_dependence_p has
>
>  if (DIFFERENT_ALIAS_SETS_P (x, mem))
>    return 0;
>
> immediately after the tests for (mem:BLK (scratch)) and
> ALIAS_SET_MEMORY_BARRIER.  On find_num.c the read from the stack arr[]
> is alias set 3, the stack tie is alias set 2, so they are seen as not
> conflicting.
>
> For mainline we have the same alias sets but with the addresses
> involved, (unspec [reg 1] UNSPEC_TIE) and (plus (reg 9) (reg 3)), we
> run all the way down to rtx_refs_may_alias_p where ao_ref_from_mem for
> the stack tie returns false.  So anti_dependence returns true.
>
> Hmm, the fact that alias analysis on mainline treats my fancy barriers
> like this says I may as well have not bothered.  (mem:BLK (scratch))
> performs identically at the moment.

Well - you are expecting generic code to understand your arch-specific
UNSPEC.  It of course cannot.  From reading back a bit I understand
that you want alias analysis to consider this a barrier for all memory
accesses that all mentioned registers possibly point to?  I'd have used

 (USE (mem:BLK (reg 1)))
 (CLOBBER (mem:BLK (reg 1)))

repeated, for each register (maybe you can avoid the USE if the CLOBBER
is implicitely considered a use, or maybe we don't need to care because
we do not specify what piece of memory is possibly clobbered, we just
specify its base address).

Oh, and the above would need to be handled explicitely anyway in alias.c

Richard.

> --
> Alan Modra
> Australia Development Lab, IBM
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 185830)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -925,7 +925,6 @@  static const char *rs6000_invalid_within_doloop (c
 static bool rs6000_legitimate_address_p (enum machine_mode, rtx, bool);
 static bool rs6000_debug_legitimate_address_p (enum machine_mode, rtx, bool);
 static rtx rs6000_generate_compare (rtx, enum machine_mode);
-static void rs6000_emit_stack_tie (void);
 static bool spe_func_has_64bit_regs_p (void);
 static rtx gen_frame_mem_offset (enum machine_mode, rtx, int);
 static unsigned rs6000_hash_constant (rtx);
@@ -18517,12 +18516,28 @@  rs6000_aix_asm_output_dwarf_table_ref (char * fram
    and the change to the stack pointer.  */
 
 static void
-rs6000_emit_stack_tie (void)
+rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
 {
-  rtx mem = gen_frame_mem (BLKmode,
-			   gen_rtx_REG (Pmode, STACK_POINTER_REGNUM));
+  rtx u;
+  rtvec p;
+  int i;
 
-  emit_insn (gen_stack_tie (mem));
+  if (REGNO (fp) == STACK_POINTER_REGNUM
+      || (hard_frame_needed
+	  && REGNO (fp) == HARD_FRAME_POINTER_REGNUM))
+    fp = NULL_RTX;
+
+  p = rtvec_alloc (1 + hard_frame_needed + (fp != NULL_RTX));
+
+  i = 0;
+  RTVEC_ELT (p, i++) = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  if (hard_frame_needed)
+    RTVEC_ELT (p, i++) = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
+  if (fp != NULL_RTX)
+    RTVEC_ELT (p, i++) = fp;
+  u = gen_rtx_UNSPEC (Pmode, p, UNSPEC_TIE);
+
+  emit_insn (gen_stack_tie (gen_frame_mem (BLKmode, u)));
 }
 
 /* Emit the correct code for allocating stack space, as insns.
@@ -19142,7 +19157,7 @@  rs6000_emit_stack_reset (rs6000_stack_t *info,
       || (TARGET_SPE_ABI
 	  && info->spe_64bit_regs_used != 0
 	  && info->first_gp_reg_save != 32))
-    rs6000_emit_stack_tie ();
+    rs6000_emit_stack_tie (frame_reg_rtx, frame_pointer_needed);
   
   if (frame_reg_rtx != sp_reg_rtx)
     {
@@ -19362,7 +19377,7 @@  rs6000_emit_prologue (void)
 	}
       rs6000_emit_allocate_stack (info->total_size, copy_reg);
       if (frame_reg_rtx != sp_reg_rtx)
-	rs6000_emit_stack_tie ();
+	rs6000_emit_stack_tie (frame_reg_rtx, false);
     }
 
   /* Handle world saves specially here.  */
@@ -19866,7 +19881,7 @@  rs6000_emit_prologue (void)
 	sp_offset = info->total_size;
       rs6000_emit_allocate_stack (info->total_size, copy_reg);
       if (frame_reg_rtx != sp_reg_rtx)
-	rs6000_emit_stack_tie ();
+	rs6000_emit_stack_tie (frame_reg_rtx, false);
     }
 
   /* Set frame pointer, if needed.  */
@@ -20437,13 +20452,7 @@  rs6000_emit_epilogue (int sibcall)
       /* Prevent reordering memory accesses against stack pointer restore.  */
       else if (cfun->calls_alloca
 	       || offset_below_red_zone_p (-info->total_size))
-	{
-	  rtx mem1 = gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx);
-	  rtx mem2 = gen_rtx_MEM (BLKmode, sp_reg_rtx);
-	  MEM_NOTRAP_P (mem1) = 1;
-	  MEM_NOTRAP_P (mem2) = 1;
-	  emit_insn (gen_frame_tie (mem1, mem2));
-	}
+	rs6000_emit_stack_tie (frame_reg_rtx, true);
 
       insn = emit_insn (gen_add3_insn (frame_reg_rtx, hard_frame_pointer_rtx,
 				       GEN_INT (info->total_size)));
@@ -20456,11 +20465,7 @@  rs6000_emit_epilogue (int sibcall)
       /* Prevent reordering memory accesses against stack pointer restore.  */
       if (cfun->calls_alloca
 	  || offset_below_red_zone_p (-info->total_size))
-	{
-	  rtx mem = gen_rtx_MEM (BLKmode, sp_reg_rtx);
-	  MEM_NOTRAP_P (mem) = 1;
-	  emit_insn (gen_stack_tie (mem));
-	}
+	rs6000_emit_stack_tie (frame_reg_rtx, false);
       insn = emit_insn (gen_add3_insn (sp_reg_rtx, sp_reg_rtx,
 				       GEN_INT (info->total_size)));
       sp_offset = 0;
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 185830)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -1451,3 +1451,11 @@ 
 
   return 1;
 })
+
+;; Return 1 if OP is a stack tie operand.
+(define_predicate "tie_operand"
+  (match_code "mem")
+{
+  return (GET_CODE (XEXP (op, 0)) == UNSPEC
+	  && XINT (XEXP (op, 0), 1) == UNSPEC_TIE);
+})
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 185830)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -11989,17 +11989,20 @@ 
 (define_expand "restore_stack_block"
   [(set (match_dup 2) (match_dup 3))
    (set (match_dup 4) (match_dup 2))
-   (set (match_dup 5) (unspec:BLK [(match_dup 5)] UNSPEC_TIE))
+   (set (match_dup 5) (const_int 0))
    (set (match_operand 0 "register_operand" "")
 	(match_operand 1 "register_operand" ""))]
   ""
   "
 {
+  rtx u;
+
   operands[1] = force_reg (Pmode, operands[1]);
   operands[2] = gen_reg_rtx (Pmode);
   operands[3] = gen_frame_mem (Pmode, operands[0]);
   operands[4] = gen_frame_mem (Pmode, operands[1]);
-  operands[5] = gen_frame_mem (BLKmode, operands[0]);
+  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
+  operands[5] = gen_frame_mem (BLKmode, u);
 }")
 
 (define_expand "save_stack_nonlocal"
@@ -12022,12 +12025,13 @@ 
   [(set (match_dup 2) (match_operand 1 "memory_operand" ""))
    (set (match_dup 3) (match_dup 4))
    (set (match_dup 5) (match_dup 2))
-   (set (match_dup 6) (unspec:BLK [(match_dup 6)] UNSPEC_TIE))
+   (set (match_dup 6) (const_int 0))
    (set (match_operand 0 "register_operand" "") (match_dup 3))]
   ""
   "
 {
   int units_per_word = (TARGET_32BIT) ? 4 : 8;
+  rtx u;
 
   /* Restore the backchain from the first word, sp from the second.  */
   operands[2] = gen_reg_rtx (Pmode);
@@ -12035,7 +12039,8 @@ 
   operands[1] = adjust_address_nv (operands[1], Pmode, 0);
   operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word);
   operands[5] = gen_frame_mem (Pmode, operands[3]);
-  operands[6] = gen_frame_mem (BLKmode, operands[0]);
+  u = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[0]), UNSPEC_TIE);
+  operands[6] = gen_frame_mem (BLKmode, u);
 }")
 
 ;; TOC register handling.
@@ -15899,25 +15904,15 @@ 
   [(set_attr "type" "branch")
    (set_attr "length" "4")])
 
-; These are to explain that changes to the stack pointer should
-; not be moved over stores to stack memory.
+; This is to explain that changes to the stack pointer should
+; not be moved over loads from or stores to stack memory.
 (define_insn "stack_tie"
-  [(set (match_operand:BLK 0 "memory_operand" "+m")
-        (unspec:BLK [(match_dup 0)] UNSPEC_TIE))]
+  [(set (match_operand:BLK 0 "tie_operand" "")
+	(const_int 0))]
   ""
   ""
   [(set_attr "length" "0")])
 
-; Like stack_tie, but depend on both fp and sp based memory.
-(define_insn "frame_tie"
-  [(set (match_operand:BLK 0 "memory_operand" "+m")
-	(unspec:BLK [(match_dup 0)
-		     (match_operand:BLK 1 "memory_operand" "m")] UNSPEC_TIE))]
-  ""
-  ""
-  [(set_attr "length" "0")])
-
-
 (define_expand "epilogue"
   [(use (const_int 0))]
   ""