diff mbox

RFA: enable LRA for rs6000 [patch for WRF]

Message ID 20130416225639.GA16621@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner April 16, 2013, 10:56 p.m. UTC
I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
        (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
     (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
        (nil)))

Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
	control whether we only allow int modes to go in the CTR, LR,
	VRSAVE, VSCR registers.
	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
	(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
	constrained.
	(rs6000_option_override_internal): Set -mconstrain-regs if we are
	using the LRA register allocator.

	* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
	does not match.

Comments

Vladimir Makarov April 17, 2013, 2:14 p.m. UTC | #1
On 13-04-16 6:56 PM, Michael Meissner wrote:
> I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
> allocator.
>
> At one point LRA has decided to use the CTR to hold a CCmode value:
>
> (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
>          (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1}
>       (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
>          (nil)))
>
> Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r,
> but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
> IMHO, LRA should not be generating NOP moves that are later deleted.
>
> There are two ways to solve the problem.  One is not to let anything but int
> modes into CTR/LR, which will also eliminate the register allocator from
> spilling floating point values there (which we've seen in the past, but the
> last time I tried to eliminate it I couldn't).  The following patch does this,
> and also changes the assertion to call fatal_insn_not_found to make it clearer
> what the error is.
>
> I imagine, I could add a NOP move insn to movcc_internal1, but that just
> strikes me as wrong.
>
> Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
> can't bootstrap the compiler using --with-cpu=power7, which I will get to
> tomorrow.
>
> 2013-04-16  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
> 	* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
> 	control whether we only allow int modes to go in the CTR, LR,
> 	VRSAVE, VSCR registers.
> 	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
> 	(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
> 	constrained.
> 	(rs6000_option_override_internal): Set -mconstrain-regs if we are
> 	using the LRA register allocator.
>
> 	* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
> 	does not match.
>
Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).

In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.

I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.
Michael Meissner April 17, 2013, 4:10 p.m. UTC | #2
On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:
> Mike, thanks for the patch and all the SPEC2006 data  (which are
> very useful as I have no access to power machine which can be used
> for benchmarking).  I guess that may be some benchmark scores are
> lower because of LRA lacks some micro-optimizations which reload
> implements through many power hooks (e.g. LRA does not use push
> reload).  Although sometimes it is not a bad thing (e.g. LRA does
> not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
> stack slots for other useful things).

SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
where we need a larger stack slot to hold spilled SDmode values (power6 did not
have the LFIWZX instruction that is needed to load SDmode values into floating
point registers).

> In general I got impression that power7 is the most difficult port
> for LRA.  If we manage to port it, LRA ports for other targets will
> be easier.

I dunno, has LRA been ported to the SH yet?

> I also reproduced bootstrap failure --with-cpu=power7 and I am going
> to work on this and after that on SPEC2006 you wrote about.

Ok.
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 197925)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -522,3 +522,7 @@  Control whether we save the TOC in the p
 mvsx-timode
 Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
 ; Allow/disallow TImode in VSX registers
+
+mconstrain-regs
+Target Undocumented Mask(CONSTRAIN_REGS) Var(rs6000_isa_flags)
+; Only allow ints of certain modes to go in SPRs
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 197925)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1608,6 +1608,18 @@  rs6000_hard_regno_mode_ok (int regno, en
   if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode))
     return 1;
 
+  /* See if we need to be stricter about what goes into the special
+     registers (LR, CTR, VSAVE, VSCR).  */
+  if (TARGET_CONSTRAIN_REGS)
+    {
+      if (regno == LR_REGNO || regno == CTR_REGNO)
+	return (GET_MODE_CLASS (mode) == MODE_INT
+		&& rs6000_hard_regno_nregs[mode][regno] == 1);
+
+      if (regno == VRSAVE_REGNO || regno == VSCR_REGNO)
+	return (mode == SImode);
+    }
+
   /* We cannot put non-VSX TImode or PTImode anywhere except general register
      and it must be able to fit within the register set.  */
 
@@ -2054,6 +2066,9 @@  rs6000_debug_reg_global (void)
   if (targetm.lra_p ())
     fprintf (stderr, DEBUG_FMT_S, "lra", "true");
 
+  if (TARGET_CONSTRAIN_REGS)
+    fprintf (stderr, DEBUG_FMT_S, "constrain-regs", "true");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2945,6 +2960,12 @@  rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* If we are using the LRA register allocator, constrain the SPRs to only
+     have integer modes, and not modes like CC.  */
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_CONSTRAIN_REGS) == 0
+      && targetm.lra_p ())
+    rs6000_isa_flags |= OPTION_MASK_CONSTRAIN_REGS;
+
   /* Place FP constants in the constant pool instead of TOC
      if section anchors enabled.  */
   if (flag_section_anchors)
Index: gcc/lra.c
===================================================================
--- gcc/lra.c	(revision 197925)
+++ gcc/lra.c	(working copy)
@@ -1996,7 +1996,8 @@  check_rtl (bool final_p)
 	if (final_p)
 	  {
 	    extract_insn (insn);
-	    lra_assert (constrain_operands (1));
+	    if (!constrain_operands (1))
+	      fatal_insn_not_found (insn);
 	    continue;
 	  }
 	if (insn_invalid_p (insn, false))