Message ID | 20130416225639.GA16621@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
On 13-04-16 6:56 PM, Michael Meissner wrote: > I tracked down the bug with the spec 2006 benchmark WRF using the LRA register > allocator. > > At one point LRA has decided to use the CTR to hold a CCmode value: > > (insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411]) > (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 {*movcc_internal1} > (expr_list:REG_DEAD (reg:CC 66 ctr [4411]) > (nil))) > > Now movcc_internal1 has moves from r->h (which includes ctr/lr) and ctr/lr->r, > but it doesn't have a move to cover the nop move of moving the ctr to the ctr. > IMHO, LRA should not be generating NOP moves that are later deleted. > > There are two ways to solve the problem. One is not to let anything but int > modes into CTR/LR, which will also eliminate the register allocator from > spilling floating point values there (which we've seen in the past, but the > last time I tried to eliminate it I couldn't). The following patch does this, > and also changes the assertion to call fatal_insn_not_found to make it clearer > what the error is. > > I imagine, I could add a NOP move insn to movcc_internal1, but that just > strikes me as wrong. > > Note, this does not fix the 32-bit failure in dealII, and I also noticed that I > can't bootstrap the compiler using --with-cpu=power7, which I will get to > tomorrow. > > 2013-04-16 Michael Meissner <meissner@linux.vnet.ibm.com> > > * config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to > control whether we only allow int modes to go in the CTR, LR, > VRSAVE, VSCR registers. > * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise. > (rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are > constrained. > (rs6000_option_override_internal): Set -mconstrain-regs if we are > using the LRA register allocator. > > * lra.c (check_rtl): Use fatal_insn_not_found to report constraint > does not match. > Mike, thanks for the patch and all the SPEC2006 data (which are very useful as I have no access to power machine which can be used for benchmarking). I guess that may be some benchmark scores are lower because of LRA lacks some micro-optimizations which reload implements through many power hooks (e.g. LRA does not use push reload). Although sometimes it is not a bad thing (e.g. LRA does not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for other useful things). In general I got impression that power7 is the most difficult port for LRA. If we manage to port it, LRA ports for other targets will be easier. I also reproduced bootstrap failure --with-cpu=power7 and I am going to work on this and after that on SPEC2006 you wrote about.
On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote: > Mike, thanks for the patch and all the SPEC2006 data (which are > very useful as I have no access to power machine which can be used > for benchmarking). I guess that may be some benchmark scores are > lower because of LRA lacks some micro-optimizations which reload > implements through many power hooks (e.g. LRA does not use push > reload). Although sometimes it is not a bad thing (e.g. LRA does > not use SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the > stack slots for other useful things). SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7, where we need a larger stack slot to hold spilled SDmode values (power6 did not have the LFIWZX instruction that is needed to load SDmode values into floating point registers). > In general I got impression that power7 is the most difficult port > for LRA. If we manage to port it, LRA ports for other targets will > be easier. I dunno, has LRA been ported to the SH yet? > I also reproduced bootstrap failure --with-cpu=power7 and I am going > to work on this and after that on SPEC2006 you wrote about. Ok.
Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 197925) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -522,3 +522,7 @@ Control whether we save the TOC in the p mvsx-timode Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags) ; Allow/disallow TImode in VSX registers + +mconstrain-regs +Target Undocumented Mask(CONSTRAIN_REGS) Var(rs6000_isa_flags) +; Only allow ints of certain modes to go in SPRs Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 197925) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1608,6 +1608,18 @@ rs6000_hard_regno_mode_ok (int regno, en if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode)) return 1; + /* See if we need to be stricter about what goes into the special + registers (LR, CTR, VSAVE, VSCR). */ + if (TARGET_CONSTRAIN_REGS) + { + if (regno == LR_REGNO || regno == CTR_REGNO) + return (GET_MODE_CLASS (mode) == MODE_INT + && rs6000_hard_regno_nregs[mode][regno] == 1); + + if (regno == VRSAVE_REGNO || regno == VSCR_REGNO) + return (mode == SImode); + } + /* We cannot put non-VSX TImode or PTImode anywhere except general register and it must be able to fit within the register set. */ @@ -2054,6 +2066,9 @@ rs6000_debug_reg_global (void) if (targetm.lra_p ()) fprintf (stderr, DEBUG_FMT_S, "lra", "true"); + if (TARGET_CONSTRAIN_REGS) + fprintf (stderr, DEBUG_FMT_S, "constrain-regs", "true"); + fprintf (stderr, DEBUG_FMT_S, "plt-format", TARGET_SECURE_PLT ? "secure" : "bss"); fprintf (stderr, DEBUG_FMT_S, "struct-return", @@ -2945,6 +2960,12 @@ rs6000_option_override_internal (bool gl } } + /* If we are using the LRA register allocator, constrain the SPRs to only + have integer modes, and not modes like CC. */ + if ((rs6000_isa_flags_explicit & OPTION_MASK_CONSTRAIN_REGS) == 0 + && targetm.lra_p ()) + rs6000_isa_flags |= OPTION_MASK_CONSTRAIN_REGS; + /* Place FP constants in the constant pool instead of TOC if section anchors enabled. */ if (flag_section_anchors) Index: gcc/lra.c =================================================================== --- gcc/lra.c (revision 197925) +++ gcc/lra.c (working copy) @@ -1996,7 +1996,8 @@ check_rtl (bool final_p) if (final_p) { extract_insn (insn); - lra_assert (constrain_operands (1)); + if (!constrain_operands (1)) + fatal_insn_not_found (insn); continue; } if (insn_invalid_p (insn, false))