Message ID | 20110625183336.GA9651@intel.com |
---|---|
State | New |
Headers | show |
On Mon, Jun 27, 2011 at 7:47 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > H.J. Lu wrote: > >> When reload gets: >> >> (insn 588 587 589 28 (set (mem:DF (zero_extend:DI (plus:SI (subreg:SI >> (reg/v/f:DI 182 [ b ]) 0) >> (const_int 8 [0x8]))) [4 MEM[base: b_96(D), index: >> D.15020_278, step: 8, offset: 0B]+0 S8 A64]) >> (reg:DF 340 [ D.14980 ])) spooles.c:291 106 >> {*movdf_internal_rex64} >> (expr_list:REG_DEAD (reg:DF 340 [ D.14980 ]) >> (nil))) > > Reloading of PLUS expressions is a long-standing problem. gen_reload > supports those only for PLUSes that look more or less like address > computations, and then only the "usual" cases. > > Is the address above (once the pseudo reg:DI 182 is replaced by > a hard reg) really a legitimate address on your platform? If not, > this would need to be fixed at some earlier place. > > If this *is* a valid address (and just not valid for this particular > insn pattern), the back-end needs to provide some means to reload to > allow reloading of such expressions. This can be done either by > providing an insn (plus post-reload splitter if necessary), or else > defining a secondary reload to handle the case where additional > registers are required. Assuming the generic gen_reload code is > powerful enough to handle complex expressions like this is probably > not wise ... > > In any case, however, gen_reload should not generate *wrong* > code, so there's indeed a bug here. > > However, this: > >> - if (CONSTANT_P (op1) || MEM_P (op1) || GET_CODE (op1) == SUBREG >> + if ((GET_CODE (op0) != SUBREG >> + && (CONSTANT_P (op1) || MEM_P (op1))) >> + || GET_CODE (op1) == SUBREG >> || (REG_P (op1) >> && REGNO (op1) >= FIRST_PSEUDO_REGISTER) >> || (code != CODE_FOR_nothing > > doesn't look like the proper fix for all cases. The actual problem > here is that this part of gen_reload takes the approach to transform > > out <- op0 + op1 > > into > > out <- op0 > out <- out + op1 > > which is invalid if writing to out clobbers op1. > > This means that: > > - The "if" testing whether to swap op0 and op1 should verify > !reg_overlap_mentioned_p (out, op0) > > - Regardless of whether we swapped or not, there needs to be a > gcc_assert (!reg_overlap_mentioned_p (out, op1)); > before the gen_reload (out, op0, opnum, type) line. > > There may still be cases where the algorithm of gen_reload doesn't > work, but at least we'll get an ICE instead of wrong code now. > Those cases will have to be fixed by the back-end as described > above ... > The problem is reload 0 puts OP1 in OUT. Adding gcc_assert (!reg_overlap_mentioned_p (out, op1)); doesn't help in reload 2. How can I check if OP1 overlaps with OUT in previous reload?
On Mon, Jun 27, 2011 at 11:28 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > H.J. Lu wrote: >> On Mon, Jun 27, 2011 at 7:47 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> > The actual problem >> > here is that this part of gen_reload takes the approach to transform >> > >> > out <- op0 + op1 >> > >> > into >> > >> > out <- op0 >> > out <- out + op1 >> > >> > which is invalid if writing to out clobbers op1. > >> The problem is reload 0 puts OP1 in OUT. Adding >> >> gcc_assert (!reg_overlap_mentioned_p (out, op1)); >> >> doesn't help in reload 2. How can I check if OP1 overlaps with >> OUT in previous reload? > > Sorry, I don't understand how previous reloads come into play here. > gen_reload is supposed to load "in" (which happens to be of the > form op0 + op1) into "out", which means it is of course supposed > to clobber "out" (as long as that doesn't implictly clobber op0 > or op1 before they're used). Any conflicts with other reloads ought > to have been resolved earlier. > > Can you elaborate? > Here is the output from reload: Reloads for insn # 588 Reload 0: reload_in (DI) = (reg/v/f:DI 182 [ b ]) GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0) reload_in_reg: (reg/v/f:DI 182 [ b ]) reload_reg_rtx: (reg:DI 1 dx) Reload 1: reload_in (DI) = (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0) reload_in_reg: (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) reload_reg_rtx: (reg:DI 1 dx) Reload 2: reload_out (DF) = (mem:DF (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) [4 MEM[base: b_96(D), index: D.15020_278, step: 8, offset: 0B]+0 S8 A64]) NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional reload_out_reg: (mem:DF (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) [4 MEM[base: b_96(D), index: D.15020_278, step: 8, offset: 0B]+0 S8 A64]) leads to (insn 1017 587 1020 34 (set (reg:DI 1 dx) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 112 [0x70])) [5 %sfp+-208 S8 A64])) spooles.c:291 62 {*movdi_internal_rex64} (nil)) (insn 1020 1017 1022 34 (set (reg:SI 1 dx) (const_int 8 [0x8])) spooles.c:291 64 {*movsi_internal} (nil)) (insn 1022 1020 1023 34 (set (reg:SI 1 dx) (reg:SI 1 dx)) spooles.c:291 64 {*movsi_internal} (nil)) (insn 1023 1022 1024 34 (set (reg:SI 1 dx) (plus:SI (reg:SI 1 dx) (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32} (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0) (const_int 8 [0x8])) (nil))) (insn 1024 1023 588 34 (set (reg:DI 1 dx) (zero_extend:DI (reg:SI 1 dx))) spooles.c:291 112 {*zero_extendsidi2_rex64} (expr_list:REG_EQUIV (zero_extend:DI (plus:SI (subreg:SI (reg:DI 1 dx) 0) (const_int 8 [0x8]))) (nil))) (insn 588 1024 589 34 (set (mem:DF (reg:DI 1 dx) [4 MEM[base: b_96(D), index: D.15020_278, step: 8, offset: 0B]+0 S8 A64]) (reg:DF 0 ax [orig:340 D.14980 ] [340])) spooles.c:291 106 {*movdf_internal_rex64} (nil)) Reload 0 puts (reg/v/f:DI 182 [ b ]) in (reg:DI 1 dx) for input. However, reload 2 puts (reg/v/f:DI 182 [ b ]) in (reg:DI 1 dx) for output.without checking what reload 0 did.
On Mon, Jun 27, 2011 at 11:59 AM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > H.J. Lu wrote: > >> Reloads for insn # 588 >> Reload 0: reload_in (DI) =3D (reg/v/f:DI 182 [ b ]) >> GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum =3D 0) >> reload_in_reg: (reg/v/f:DI 182 [ b ]) >> reload_reg_rtx: (reg:DI 1 dx) >> Reload 1: reload_in (DI) =3D (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:D= >> I 182 >> [ b ]) 0) >> (const_int 8 [0x8])= >> )) >> GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum =3D 0) >> reload_in_reg: (zero_extend:DI (plus:SI (subreg:SI (reg/v/f:DI 182 = >> [ b >> ]) 0) >> (const_int 8 [0x8])= >> )) >> reload_reg_rtx: (reg:DI 1 dx) >> Reload 2: reload_out (DF) =3D (mem:DF (zero_extend:DI (plus:SI (subreg:SI >> (reg/v/f:DI 182 [ b ]) 0) >> (const_int 8 >> [0x8]))) [4 MEM[base: b_96(D), index: D.15020_278, step: 8, offset: 0B]+0 S8 >> A64]) >> NO_REGS, RELOAD_FOR_OUTPUT (opnum =3D 0), optional >> reload_out_reg: (mem:DF (zero_extend:DI (plus:SI (subreg:SI (reg/v/= >> f:DI >> 182 [ b ]) 0) >> (const_int 8 >> [0x8]))) [4 MEM[base: b_96(D), index: D.15020_278, step: 8, offset: 0B]+0 S8 >> A64]) >> >> leads to >> > >> (insn 1017 587 1020 34 (set (reg:DI 1 dx) >> (mem/c:DI (plus:DI (reg/f:DI 7 sp) >> (const_int 112 [0x70])) [5 %sfp+-208 S8 A64])) spooles.c:29= >> 1 62 >> {*movdi_internal_rex64} >> (nil)) > > So this is the reload insn generated from reload 0. So far so good. > >> (insn 1020 1017 1022 34 (set (reg:SI 1 dx) >> (const_int 8 [0x8])) spooles.c:291 64 {*movsi_internal} >> (nil)) >> >> (insn 1022 1020 1023 34 (set (reg:SI 1 dx) >> (reg:SI 1 dx)) spooles.c:291 64 {*movsi_internal} >> (nil)) >> >> (insn 1023 1022 1024 34 (set (reg:SI 1 dx) >> (plus:SI (reg:SI 1 dx) >> (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32} >> (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0) >> (const_int 8 [0x8])) >> (nil))) >> >> (insn 1024 1023 588 34 (set (reg:DI 1 dx) >> (zero_extend:DI (reg:SI 1 dx))) spooles.c:291 112 >> {*zero_extendsidi2_rex64} >> (expr_list:REG_EQUIV (zero_extend:DI (plus:SI (subreg:SI (reg:DI 1 dx)= >> 0) >> (const_int 8 [0x8]))) >> (nil))) > > All these reload insns are generated from reload 1. > >> (insn 588 1024 589 34 (set (mem:DF (reg:DI 1 dx) [4 MEM[base: b_96(D), inde= >> x: >> D.15020_278, step: 8, offset: 0B]+0 S8 A64]) >> (reg:DF 0 ax [orig:340 D.14980 ] [340])) spooles.c:291 106 >> {*movdf_internal_rex64} >> (nil)) > > This is the original reloaded insn. > >> Reload 0 puts (reg/v/f:DI 182 [ b ]) in (reg:DI 1 dx) for input. >> However, reload 2 >> puts (reg/v/f:DI 182 [ b ]) in (reg:DI 1 dx) for output.without checking w= >> hat >> reload 0 did. > > Reload 2 is an optional reload which reload chose not to utilize, so this > is not really relevant here in any case. There is no output reload. > > The wrong code above originates from how reload 1 is handled: > > gen_reload is called to load the ZERO_EXTEND into (reg:DI 1). This triggers > the "unary predicate" path, which recurses into gen_reload to load the operand > of the ZERO_EXTEND (reg:SI 1), and subsequently generates insn 1024. > > The recursive call loads (plus:SI (subreg:SI (reg:DI 1)) (const_int 8)) into > (reg:SI 1). It attempts to do that in a single SET and fails (for some > reason). It then attempts to load the constant (const_int 8) into the > destination register (insn 1020) [** which is broken **], and re-tries. > This still fails, so it falls through to the last attempt, which is to > instead copy the subreg to the destination (which results in insn 1022 > as the subreg is optimized away at this point), followed by adding the > constant. > > Note that the point marked with "[** which is broken **]" is the place > I pointed out in the previous mail. > reload generates: (insn 914 912 0 (set (reg:SI 0 ax) (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) 248 {*lea_1_x32} (nil)) from insn = emit_insn_if_valid_for_reload (gen_rtx_SET (VOIDmode, out, in)); Since (reg/v/f:DI 182 [ b ]) is a pseudo register, it is rejected by if ((strict && ! REG_OK_FOR_BASE_STRICT_P (reg)) || (! strict && ! REG_OK_FOR_BASE_NONSTRICT_P (reg))) /* Base is not valid. */ return false; in ix86_legitimate_address_p.
On Mon, Jun 27, 2011 at 3:08 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > H.J. Lu wrote: > >> reload generates: >> >> (insn 914 912 0 (set (reg:SI 0 ax) >> (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) >> (const_int 8 [0x8]))) 248 {*lea_1_x32} >> (nil)) >> >> from >> >> insn = emit_insn_if_valid_for_reload (gen_rtx_SET (VOIDmode, out, in)); > > Interesting. The pseudo should have been replaced by the > hard register (reg:DI 1) during the preceding call to > op0 = find_replacement (&XEXP (in, 0)); > (since reload 0 should have pushed a replacement record.) > > Interestingly enough, in the final output that replacement *is* > performed in the REG_EQUIV note: > > (insn 1023 1022 1024 34 (set (reg:SI 1 dx) > (plus:SI (reg:SI 1 dx) > (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32} > (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0) > (const_int 8 [0x8])) > (nil))) > > which is why I hadn't expected this to be a problem here. > > Can you try to find out why the find_replacement doesn't work > with your test case? > I will investigate. Could (reg:SI 1 dx) vs (subreg:SI (reg:DI 1 dx) 0) a problem?
On Mon, Jun 27, 2011 at 3:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Jun 27, 2011 at 3:08 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> H.J. Lu wrote: >> >>> reload generates: >>> >>> (insn 914 912 0 (set (reg:SI 0 ax) >>> (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) >>> (const_int 8 [0x8]))) 248 {*lea_1_x32} >>> (nil)) >>> >>> from >>> >>> insn = emit_insn_if_valid_for_reload (gen_rtx_SET (VOIDmode, out, in)); >> >> Interesting. The pseudo should have been replaced by the >> hard register (reg:DI 1) during the preceding call to >> op0 = find_replacement (&XEXP (in, 0)); >> (since reload 0 should have pushed a replacement record.) >> >> Interestingly enough, in the final output that replacement *is* >> performed in the REG_EQUIV note: >> >> (insn 1023 1022 1024 34 (set (reg:SI 1 dx) >> (plus:SI (reg:SI 1 dx) >> (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32} >> (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0) >> (const_int 8 [0x8])) >> (nil))) >> >> which is why I hadn't expected this to be a problem here. >> >> Can you try to find out why the find_replacement doesn't work >> with your test case? >> > > I will investigate. Could (reg:SI 1 dx) vs (subreg:SI (reg:DI 1 dx) 0) > a problem? > find_replacement never checks subreg: Breakpoint 3, find_replacement (loc=0x7ffff068ab00) at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411 6411 if (reloadreg && r->where == loc) (reg:DI 0 ax) (reg/v/f:DI 182 [ b ]) (gdb) call debug_rtx (*loc) (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (gdb)
diff --git a/gcc/reload1.c b/gcc/reload1.c index 4a697c2..d618a29 100644 --- a/gcc/reload1.c +++ b/gcc/reload1.c @@ -8528,7 +8528,9 @@ gen_reload (rtx out, rtx in, int opnum, enum reload_type type) code = optab_handler (add_optab, GET_MODE (out)); - if (CONSTANT_P (op1) || MEM_P (op1) || GET_CODE (op1) == SUBREG + if ((GET_CODE (op0) != SUBREG + && (CONSTANT_P (op1) || MEM_P (op1))) + || GET_CODE (op1) == SUBREG || (REG_P (op1) && REGNO (op1) >= FIRST_PSEUDO_REGISTER) || (code != CODE_FOR_nothing