Message ID | HE1PR08MB050731A502EB0DBF9734E8EDE7D10@HE1PR08MB0507.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Thu, Feb 04, 2016 at 10:11:53AM +0000, Bin Cheng wrote: > Hi, > There is a performance regression caused by my previous change to > aarch64_legitimize_address, in which I forced constant offset out of memory > ref if the address expr is in the form of "reg1 + reg2 << scale + const". > The intention is to reveal as many loop-invariant opportunities as possible, > while depend on GIMPLE optimizers picking up CSE opportunities of "reg << > scale" among different memory references. > > Though the assumption still holds, gimple optimizers are not powerful enough > to pick up CSE opportunities of register scaling expressions at current time. > Here comes a workaround: this patch forces register scaling expression out of > memory ref, so that RTL CSE pass can handle common register scaling > expressions issue, of course, at a cost of possibly missed loop invariants. > > James and I collected perf data, fortunately this change can improve > performance for several cases in various benchmarks, while doesn't cause big > regression. It also recovers big regression we observed before for the > previous change. > > I also added comment explaining why the workaround is necessary. I also > files PR69653 as an example showing tree optimizer should be improved. > > Bootstrap and test on AArch64, is it OK? OK. Thanks, James > > Thanks, > bin > > > 2016-02-04 Bin Cheng <bin.cheng@arm.com> > > * config/aarch64/aarch64.c (aarch64_legitimize_address): Force > register scaling out of memory reference and comment why.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 03bc1b9..24a7d5b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4926,13 +4926,18 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) Rt = Ra + Rc; addr = Rt + Rb<<SCALE. - Here we split CONST out of memory referece because: + TODO: We really should split CONST out of memory referece + because: a) We depend on GIMPLE optimizers to pick up common sub expression involving the scaling operation. b) The index Rb is likely a loop iv, it's better to split the CONST so that computation of new base Rt is a loop invariant and can be moved out of loop. This is more - important when the original base Ra is sfp related. */ + important when the original base Ra is sfp related. + + Unfortunately, GIMPLE optimizers (e.g., SLSR) can not handle + this kind of CSE opportunity at the time of this change, we + have to force register scaling expr out of memory ref now. */ else if (REG_P (op0) || REG_P (op1)) { machine_mode addr_mode = GET_MODE (x); @@ -4942,14 +4947,13 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) if (REG_P (op1)) std::swap (op0, op1); - rtx addr = gen_rtx_PLUS (addr_mode, op1, base); + rtx addr = plus_constant (addr_mode, base, offset); if (aarch64_legitimate_address_hook_p (mode, addr, false)) { - base = force_operand (plus_constant (addr_mode, - op0, offset), + base = force_operand (gen_rtx_PLUS (addr_mode, op1, op0), NULL_RTX); - return gen_rtx_PLUS (addr_mode, op1, base); + return plus_constant (addr_mode, base, offset); } } }