Patchwork [CSE] Bit-field insertion optimization

login
register
mail settings
Submitter Andrew Stubbs
Date Dec. 9, 2010, 4:45 p.m.
Message ID <4D010797.60004@codesourcery.com>
Download mbox | patch
Permalink /patch/74939/
State New
Headers show

Comments

Andrew Stubbs - Dec. 9, 2010, 4:45 p.m.
The attached patch fixes a bug in which constant assignments to bit 
fields are improperly optimized. I'm seeing this on ARM, but I imagine 
it affects other targets similarly.

The first problem is that CSE cannot determine that the result is 
constant because the auto-variable is implicitly initialized. I have 
solved this by moving up the init-regs pass to before cse2. It might be 
better to move it before cse1, but that's a bigger change, so I wasn't sure?

The second problem is that the pattern match for ZERO_EXTRACT requires 
that the operand is an immediate constant, which is never the case on 
ARM (and presumably is only the case with a limited range of inputs even 
on other targets). I have added code to detect known-constant input 
registers.

Test case:

   struct bits
   {
      unsigned a:5;
      unsigned b:5;
      unsigned c:5;
      unsigned d:5;
   };

   struct bits
   f (unsigned int a)
   {
      struct bits bits = {0,0,0,0};
      bits.a = 1;
      bits.b = 2;
      bits.c = 3;
      bits.d = a;
      return bits;
   }


Before, compiled for ARM with "-O2 -mcpu=cortex-a8 -mthumb":
         movs    r2, #1
         movs    r3, #0
         bfi     r3, r2, #0, #5
         movs    r2, #2
         bfi     r3, r2, #5, #5
         movs    r2, #3
         bfi     r3, r2, #10, #5
         bfi     r3, r0, #15, #5
         mov     r0, r3
         bx      lr

After:
         movw    r3, #3137
         bfi     r3, r0, #15, #5
         mov     r0, r3
         bx      lr

OK for commit, once stage 1 opens again?

Andrew
Paolo Bonzini - Dec. 9, 2010, 5:47 p.m.
On 12/09/2010 05:45 PM, Andrew Stubbs wrote:
> The first problem is that CSE cannot determine that the result is
> constant because the auto-variable is implicitly initialized. I have
> solved this by moving up the init-regs pass to before cse2. It might be
> better to move it before cse1, but that's a bigger change, so I wasn't
> sure?

I think that would be fine independent of this change.

> The second problem is that the pattern match for ZERO_EXTRACT requires
> that the operand is an immediate constant, which is never the case on
> ARM (and presumably is only the case with a limited range of inputs even
> on other targets). I have added code to detect known-constant input
> registers.

Since I can choose, I prefer Chung-Lin's patch. :)

Paolo
Andrew Stubbs - Dec. 10, 2010, 9:46 a.m.
On 09/12/10 17:47, Paolo Bonzini wrote:
> On 12/09/2010 05:45 PM, Andrew Stubbs wrote:
>> The first problem is that CSE cannot determine that the result is
>> constant because the auto-variable is implicitly initialized. I have
>> solved this by moving up the init-regs pass to before cse2. It might be
>> better to move it before cse1, but that's a bigger change, so I wasn't
>> sure?
>
> I think that would be fine independent of this change.
>
>> The second problem is that the pattern match for ZERO_EXTRACT requires
>> that the operand is an immediate constant, which is never the case on
>> ARM (and presumably is only the case with a limited range of inputs even
>> on other targets). I have added code to detect known-constant input
>> registers.
>
> Since I can choose, I prefer Chung-Lin's patch. :)

So, FAOD, you think the init-regs pass should be moved, but the 
ZERO_EXTRACT code should be dropped in favour of the other patch?

Should the init-regs be moved before cse1 or cse2?

I'm happy to resubmit such a patch, but would it be applicable now, or 
would I have to wait for stage one?

Andrew
Paolo Bonzini - Dec. 10, 2010, 9:56 a.m.
On 12/10/2010 10:46 AM, Andrew Stubbs wrote:
>
> Should the init-regs be moved before cse1 or cse2?

CSE1.

We definitely need to catch every opportunity to get rid of the 
initializations, so moving the pass earlier is nice.

> I'm happy to resubmit such a patch, but would it be applicable now, or
> would I have to wait for stage one?

Try running CSiBE or comparing a set of .i files.  If you see any case 
in which the change stands out as particularly beneficial, it may even 
be fine now.  Otherwise, including if there are size regressions, it 
should wait.

Paolo
Richard Guenther - Dec. 10, 2010, 12:33 p.m.
On Fri, Dec 10, 2010 at 10:56 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 12/10/2010 10:46 AM, Andrew Stubbs wrote:
>>
>> Should the init-regs be moved before cse1 or cse2?
>
> CSE1.
>
> We definitely need to catch every opportunity to get rid of the
> initializations, so moving the pass earlier is nice.
>
>> I'm happy to resubmit such a patch, but would it be applicable now, or
>> would I have to wait for stage one?
>
> Try running CSiBE or comparing a set of .i files.  If you see any case in
> which the change stands out as particularly beneficial, it may even be fine
> now.  Otherwise, including if there are size regressions, it should wait.

I think we don't want to re-order passes at this point in time.

Richard.

> Paolo
>
Jeff Law - Dec. 10, 2010, 4 p.m.
On 12/09/10 09:45, Andrew Stubbs wrote:
> The attached patch fixes a bug in which constant assignments to bit 
> fields are improperly optimized. I'm seeing this on ARM, but I imagine 
> it affects other targets similarly.
>
> The first problem is that CSE cannot determine that the result is 
> constant because the auto-variable is implicitly initialized. I have 
> solved this by moving up the init-regs pass to before cse2. It might 
> be better to move it before cse1, but that's a bigger change, so I 
> wasn't sure?
If this change ends up being made, I would strongly recommend a comment 
about why the change was made be placed in passes.c.

I would hazard a guess that the pass was placed late in the pipeline 
because there wasn't any value seen in running it prior to combine.  The 
only thing I'd be concerned about would be introducing useless 
initializations if we move the pass earlier since fewer optimizers would 
have been run and thus more unexecutable paths are still in the CFG.  It 
would be particularly interesting to run some benchmarks with  only this 
change and see how codesize is affected.

>
> The second problem is that the pattern match for ZERO_EXTRACT requires 
> that the operand is an immediate constant, which is never the case on 
> ARM (and presumably is only the case with a limited range of inputs 
> even on other targets). I have added code to detect known-constant 
> input registers.
This seems fairly reasonable.

  I believe the preferred method for queuing pending patches is to open 
a bug for failure to optimize, attach the patches & testcase to that 
bug.  Then make the GCC 4.7 pending patches meta-bug depend on the 
failure to optimize bug.

Jeff
Andrew Stubbs - Dec. 10, 2010, 5:03 p.m.
On 10/12/10 16:00, Jeff Law wrote:
>   I believe the preferred method for queuing pending patches is to open
> a bug for failure to optimize, attach the patches & testcase to that
> bug.  Then make the GCC 4.7 pending patches meta-bug depend on the
> failure to optimize bug.

Now filed here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46888

Andrew

Patch

2010-12-09  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* cse.c (cse_insn): Add support for ZERO_EXTRACT with a register
	source operand.
	* passes.c (init_optimization_passes): Move initialize_regs
	before cse2.

---
 src/gcc-mainline/gcc/cse.c    |   24 ++++++++++++++++++++----
 src/gcc-mainline/gcc/passes.c |    2 +-
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/gcc-mainline/gcc/cse.c b/src/gcc-mainline/gcc/cse.c
index 3ab6b37..d283040 100644
--- a/src/gcc-mainline/gcc/cse.c
+++ b/src/gcc-mainline/gcc/cse.c
@@ -5036,7 +5036,6 @@  cse_insn (rtx insn)
 	     (set (zero_extract:M2 (reg:M N) (const_int C) (const_int D))
 		  (reg:M2 O)).  */
 	  if (GET_CODE (SET_DEST (sets[i].rtl)) == ZERO_EXTRACT
-	      && CONST_INT_P (trial)
 	      && CONST_INT_P (XEXP (SET_DEST (sets[i].rtl), 1))
 	      && CONST_INT_P (XEXP (SET_DEST (sets[i].rtl), 2))
 	      && REG_P (XEXP (SET_DEST (sets[i].rtl), 0))
@@ -5052,9 +5051,26 @@  cse_insn (rtx insn)
 	      unsigned int dest_hash = HASH (dest_reg, GET_MODE (dest_reg));
 	      struct table_elt *dest_elt
 		= lookup (dest_reg, dest_hash, GET_MODE (dest_reg));
-	      rtx dest_cst = NULL;
+	      rtx dest_cst = NULL, src_cst = NULL;
 
-	      if (dest_elt)
+	      if (CONST_INT_P (trial))
+		 src_cst = trial;
+	      else if (REG_P (trial))
+		{
+		  unsigned int src_hash = HASH (trial, GET_MODE (trial));
+		  struct table_elt *src_elt
+		    = lookup (trial, src_hash, GET_MODE (trial));
+
+		  if (src_elt)
+		    for (p = src_elt->first_same_value; p; p = p->next_same_value)
+		      if (p->is_const && CONST_INT_P (p->exp))
+			{
+			  src_cst = p->exp;
+			  break;
+			}
+		}
+
+	      if (src_cst && dest_elt)
 		for (p = dest_elt->first_same_value; p; p = p->next_same_value)
 		  if (p->is_const && CONST_INT_P (p->exp))
 		    {
@@ -5076,7 +5092,7 @@  cse_insn (rtx insn)
 		  else
 		    mask = ((HOST_WIDE_INT) 1 << INTVAL (width)) - 1;
 		  val &= ~(mask << shift);
-		  val |= (INTVAL (trial) & mask) << shift;
+		  val |= (INTVAL (src_cst) & mask) << shift;
 		  val = trunc_int_for_mode (val, GET_MODE (dest_reg));
 		  validate_unshare_change (insn, &SET_DEST (sets[i].rtl),
 					   dest_reg, 1);
diff --git a/src/gcc-mainline/gcc/passes.c b/src/gcc-mainline/gcc/passes.c
index 4be61a9..c1c656d 100644
--- a/src/gcc-mainline/gcc/passes.c
+++ b/src/gcc-mainline/gcc/passes.c
@@ -990,11 +990,11 @@  init_optimization_passes (void)
 	}
       NEXT_PASS (pass_web);
       NEXT_PASS (pass_rtl_cprop);
+      NEXT_PASS (pass_initialize_regs);
       NEXT_PASS (pass_cse2);
       NEXT_PASS (pass_rtl_dse1);
       NEXT_PASS (pass_rtl_fwprop_addr);
       NEXT_PASS (pass_inc_dec);
-      NEXT_PASS (pass_initialize_regs);
       NEXT_PASS (pass_ud_rtl_dce);
       NEXT_PASS (pass_combine);
       NEXT_PASS (pass_if_after_combine);