diff mbox

[AArch64] Force register scaling out of mem ref and comment why

Message ID HE1PR08MB050731A502EB0DBF9734E8EDE7D10@HE1PR08MB0507.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng Feb. 4, 2016, 10:11 a.m. UTC
Hi,
There is a performance regression caused by my previous change to aarch64_legitimize_address, in which I forced constant offset out of memory ref if the address expr is in the form of "reg1 + reg2 << scale + const".  The intention is to reveal as many loop-invariant opportunities as possible, while depend on GIMPLE optimizers picking up CSE opportunities of "reg << scale" among different memory references.

Though the assumption still holds, gimple optimizers are not powerful enough to pick up CSE opportunities of register scaling expressions at current time.  Here comes a workaround: this patch forces register scaling expression out of memory ref, so that RTL CSE pass can handle common register scaling expressions issue, of course, at a cost of possibly missed loop invariants.

James and I collected perf data, fortunately this change can improve performance for several cases in various benchmarks, while doesn't cause big regression.  It also recovers big regression we observed before for the previous change.

I also added comment explaining why the workaround is necessary.  I also files PR69653 as an example showing tree optimizer should be improved.

Bootstrap and test on AArch64, is it OK?

Thanks,
bin


2016-02-04  Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/aarch64.c (aarch64_legitimize_address): Force
	register scaling out of memory reference and comment why.

Comments

James Greenhalgh Feb. 4, 2016, 10:14 a.m. UTC | #1
On Thu, Feb 04, 2016 at 10:11:53AM +0000, Bin Cheng wrote:
> Hi,
> There is a performance regression caused by my previous change to
> aarch64_legitimize_address, in which I forced constant offset out of memory
> ref if the address expr is in the form of "reg1 + reg2 << scale + const".
> The intention is to reveal as many loop-invariant opportunities as possible,
> while depend on GIMPLE optimizers picking up CSE opportunities of "reg <<
> scale" among different memory references.
> 
> Though the assumption still holds, gimple optimizers are not powerful enough
> to pick up CSE opportunities of register scaling expressions at current time.
> Here comes a workaround: this patch forces register scaling expression out of
> memory ref, so that RTL CSE pass can handle common register scaling
> expressions issue, of course, at a cost of possibly missed loop invariants.
> 
> James and I collected perf data, fortunately this change can improve
> performance for several cases in various benchmarks, while doesn't cause big
> regression.  It also recovers big regression we observed before for the
> previous change.
> 
> I also added comment explaining why the workaround is necessary.  I also
> files PR69653 as an example showing tree optimizer should be improved.
> 
> Bootstrap and test on AArch64, is it OK?

OK.

Thanks,
James

> 
> Thanks,
> bin
> 
> 
> 2016-02-04  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_legitimize_address): Force
> 	register scaling out of memory reference and comment why.
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 03bc1b9..24a7d5b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4926,13 +4926,18 @@  aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
 	       Rt = Ra + Rc;
 	       addr = Rt + Rb<<SCALE.
 
-	     Here we split CONST out of memory referece because:
+	     TODO: We really should split CONST out of memory referece
+	     because:
 	       a) We depend on GIMPLE optimizers to pick up common sub
 		  expression involving the scaling operation.
 	       b) The index Rb is likely a loop iv, it's better to split
 		  the CONST so that computation of new base Rt is a loop
 		  invariant and can be moved out of loop.  This is more
-		  important when the original base Ra is sfp related.  */
+		  important when the original base Ra is sfp related.
+
+	     Unfortunately, GIMPLE optimizers (e.g., SLSR) can not handle
+	     this kind of CSE opportunity at the time of this change, we
+	     have to force register scaling expr out of memory ref now.  */
 	  else if (REG_P (op0) || REG_P (op1))
 	    {
 	      machine_mode addr_mode = GET_MODE (x);
@@ -4942,14 +4947,13 @@  aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
 	      if (REG_P (op1))
 		std::swap (op0, op1);
 
-	      rtx addr = gen_rtx_PLUS (addr_mode, op1, base);
+	      rtx addr = plus_constant (addr_mode, base, offset);
 
 	      if (aarch64_legitimate_address_hook_p (mode, addr, false))
 		{
-		  base = force_operand (plus_constant (addr_mode,
-						       op0, offset),
+		  base = force_operand (gen_rtx_PLUS (addr_mode, op1, op0),
 					NULL_RTX);
-		  return gen_rtx_PLUS (addr_mode, op1, base);
+		  return plus_constant (addr_mode, base, offset);
 		}
 	    }
 	}