Patchwork [combine] Bit-field insertion optimization #2

login
register
mail settings
Submitter Chung-Lin Tang
Date Dec. 9, 2010, 5:21 p.m.
Message ID <4D010FFC.6040805@codesourcery.com>
Download mbox | patch
Permalink /patch/74949/
State New
Headers show

Comments

Chung-Lin Tang - Dec. 9, 2010, 5:21 p.m.
Hi, asides from Andrew's patch, we also have this alternate patch using 
combine. With respect to the bitfield case here, the two patches are 
mostly similar in effect, but we thought we'd post both for discussion 
anyways.

Currently, combine detects and rejects cases where the combine target
insn is a "modification" (LHS is a SUBREG, STRICT_LOW_PART, or
ZERO_EXTRACT), plus one of the prior merged insns has a SET destination
overlap, citing wrong semantics when such two insns are merged.

For the same testcase:
struct bits {
   unsigned a : 5;
   unsigned b : 5;
   unsigned c : 5;
   unsigned d : 5;
};

struct bits
f (unsigned int a)
{
   struct bits bits;
   bits.a = 1;
   bits.b = 2;
   bits.c = 3;
   bits.d = a;
   return bits;
}

The RTL encountered by combine on ARM and fails to optimize is like this:
   2: r138:SI = r0:SI
   6: r139:SI = 1
  23: r136:SI = 0				@
   7: zero_extract [4:0] r136:SI = r139:SI	@ See how r136 is
   8: r140:SI = 2				@ modified throughout
   9: zero_extract [9:5] r136:SI = r140:SI	@ the insns
  10: r141:SI = 3
  11: zero_extract [14:10] r136:SI = r141:SI
  12: zero_extract [19:15] r136:SI = r138:SI
  17: r0:SI = r136:SI
  20: <use r0:SI>

For cases like ARM, this patch enhances combine's handled cases by using
its own expand_field_assignment() function (currently applied for prior
merged insn patterns) to transform the combine target insn which happens
to be a modification into a larger IOR/AND expression; this can turn the
LHS into a normal register write.

For example,
change (zero_extract reg1 #5 #0) = <foo>
into   (reg1) = (ior (and reg1 0xffffffe0) (and <foo> 0x0000001f))

This plus the various propagation and simplification done later in
combine increases the chances it successfully finds a new combined
bitfield operation.

The combine patch is quite small, basically a call to
expand_field_assignment() + another try at combinable_i3pat() before
giving up and returning 0, enabled when flag_expensive_optimizations is set.

With this patch, the testcase generated code for ARM improves from:
         mov     r2, #1
         mov     r3, #0
         bfi     r3, r2, #0, #5
         mov     r2, #2
         bfi     r3, r2, #5, #5
         mov     r2, #3
         bfi     r3, r2, #10, #5
         bfi     r3, r0, #15, #5
         mov     r0, r3
         bx      lr
into
         movw    r3, #3137
         bfi     r3, r0, #15, #5
         mov     r0, r3
         bx      lr

As mentioned, for this bitfield case, Andrew's CSE patch and mine is 
mostly similar in effect. From the ordering of passes however, the CSE 
patch will shadow this combine one if both were applied :P

Thanks,
Chung-Lin

2010-12-09  Chung-Lin Tang  <cltang@codesourcery.com>

	* combine.c (combinable_i3pat): Try field expanding the combine
	target insn when it modifies its LHS output, changing it into
	overwriting form, before failing the combine attempt.
Andrew Pinski - Dec. 9, 2010, 11:12 p.m.
On Thu, Dec 9, 2010 at 9:21 AM, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> Hi, asides from Andrew's patch, we also have this alternate patch using
> combine. With respect to the bitfield case here, the two patches are mostly
> similar in effect, but we thought we'd post both for discussion anyways.

This patch will not bootstrap anywhere because of the warning:
/data/src/gcc-cavium/clean/trunk/src/gcc/combine.c: In function
‘combinable_i3pat’:
/data/src/gcc-cavium/clean/trunk/src/gcc/combine.c:1941: warning:
initialization discards qualifiers from pointer target type

+	      rtx exppat = expand_field_assignment (x);

expand_field_assignment returns a const_rtx.

-- Pinski
Jeff Law - Dec. 10, 2010, 4:04 p.m.
On 12/09/10 10:21, Chung-Lin Tang wrote:
> Hi, asides from Andrew's patch, we also have this alternate patch 
> using combine. With respect to the bitfield case here, the two patches 
> are mostly similar in effect, but we thought we'd post both for 
> discussion anyways.
>
> Currently, combine detects and rejects cases where the combine target
> insn is a "modification" (LHS is a SUBREG, STRICT_LOW_PART, or
> ZERO_EXTRACT), plus one of the prior merged insns has a SET destination
> overlap, citing wrong semantics when such two insns are merged.
>
> For the same testcase:
> struct bits {
>   unsigned a : 5;
>   unsigned b : 5;
>   unsigned c : 5;
>   unsigned d : 5;
> };
>
> struct bits
> f (unsigned int a)
> {
>   struct bits bits;
>   bits.a = 1;
>   bits.b = 2;
>   bits.c = 3;
>   bits.d = a;
>   return bits;
> }
>
> The RTL encountered by combine on ARM and fails to optimize is like this:
>   2: r138:SI = r0:SI
>   6: r139:SI = 1
>  23: r136:SI = 0                @
>   7: zero_extract [4:0] r136:SI = r139:SI    @ See how r136 is
>   8: r140:SI = 2                @ modified throughout
>   9: zero_extract [9:5] r136:SI = r140:SI    @ the insns
>  10: r141:SI = 3
>  11: zero_extract [14:10] r136:SI = r141:SI
>  12: zero_extract [19:15] r136:SI = r138:SI
>  17: r0:SI = r136:SI
>  20: <use r0:SI>
>
> For cases like ARM, this patch enhances combine's handled cases by using
> its own expand_field_assignment() function (currently applied for prior
> merged insn patterns) to transform the combine target insn which happens
> to be a modification into a larger IOR/AND expression; this can turn the
> LHS into a normal register write.
>
> For example,
> change (zero_extract reg1 #5 #0) = <foo>
> into   (reg1) = (ior (and reg1 0xffffffe0) (and <foo> 0x0000001f))
>
> This plus the various propagation and simplification done later in
> combine increases the chances it successfully finds a new combined
> bitfield operation.
>
> The combine patch is quite small, basically a call to
> expand_field_assignment() + another try at combinable_i3pat() before
> giving up and returning 0, enabled when flag_expensive_optimizations 
> is set.
>
> With this patch, the testcase generated code for ARM improves from:
>         mov     r2, #1
>         mov     r3, #0
>         bfi     r3, r2, #0, #5
>         mov     r2, #2
>         bfi     r3, r2, #5, #5
>         mov     r2, #3
>         bfi     r3, r2, #10, #5
>         bfi     r3, r0, #15, #5
>         mov     r0, r3
>         bx      lr
> into
>         movw    r3, #3137
>         bfi     r3, r0, #15, #5
>         mov     r0, r3
>         bx      lr
>
> As mentioned, for this bitfield case, Andrew's CSE patch and mine is 
> mostly similar in effect. From the ordering of passes however, the CSE 
> patch will shadow this combine one if both were applied :P
>
> Thanks,
> Chung-Lin
>
> 2010-12-09  Chung-Lin Tang <cltang@codesourcery.com>
>
>     * combine.c (combinable_i3pat): Try field expanding the combine
>     target insn when it modifies its LHS output, changing it into
>     overwriting form, before failing the combine attempt.
This variant looks pretty reasonable as well.  It has the minor 
advantage that we don't need to move the init-regs pass.

I guess I'd be interested to know which of the two generally works 
better and whether or not if you apply them both, does the new combine 
code ever trigger?

Like the cse patch, I think there should be a failure to optimize bug 
with the testcase and potential patches attached.
jeff
Andrew Pinski - Dec. 13, 2010, 11:06 p.m.
On Thu, Dec 9, 2010 at 9:21 AM, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> Hi, asides from Andrew's patch, we also have this alternate patch using
> combine. With respect to the bitfield case here, the two patches are mostly
> similar in effect, but we thought we'd post both for discussion anyways.
>
> Currently, combine detects and rejects cases where the combine target
> insn is a "modification" (LHS is a SUBREG, STRICT_LOW_PART, or
> ZERO_EXTRACT), plus one of the prior merged insns has a SET destination
> overlap, citing wrong semantics when such two insns are merged.



This patch is broken when we have:
(insn 8 7 14 2 ins-23.c:25 (set (zero_extract:SI (reg:SI 195 [ a ])
            (const_int 8 [0x8])
            (const_int 12 [0xc]))
        (reg:SI 196)) 214 {insvsi} (expr_list:REG_EQUAL (const_int -1
[0xffffffffffffffff])
        (nil)))

It does the correct thing of replacing it with:
(set (reg:SI 195 [ a ])
    (ior:SI (reg:SI 4 $4 [ a ])
        (const_int 1044480 [0xff000])))
But then it does remove the reg_equal from the instruction which
causes us to get:
(set (reg/i:DI 2 $2)
    (const_int -1 [0xffffffffffffffff]))

Thanks,
Andrew Pinski
Chung-Lin Tang - Dec. 14, 2010, 1:41 a.m.
On 2010/12/14 07:06, Andrew Pinski wrote:
> This patch is broken when we have:
> (insn 8 7 14 2 ins-23.c:25 (set (zero_extract:SI (reg:SI 195 [ a ])
>             (const_int 8 [0x8])
>             (const_int 12 [0xc]))
>         (reg:SI 196)) 214 {insvsi} (expr_list:REG_EQUAL (const_int -1
> [0xffffffffffffffff])
>         (nil)))
> 
> It does the correct thing of replacing it with:
> (set (reg:SI 195 [ a ])
>     (ior:SI (reg:SI 4 $4 [ a ])
>         (const_int 1044480 [0xff000])))
> But then it does remove the reg_equal from the instruction which
> causes us to get:
> (set (reg/i:DI 2 $2)
>     (const_int -1 [0xffffffffffffffff]))

Hi Andrew, thanks for testing and finding this. Do you have a testcase I
can use?

Thanks!
Chung-Lin
Andrew Pinski - Dec. 14, 2010, 2:16 a.m.
On Mon, Dec 13, 2010 at 5:41 PM, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> On 2010/12/14 07:06, Andrew Pinski wrote:
>> This patch is broken when we have:
>> (insn 8 7 14 2 ins-23.c:25 (set (zero_extract:SI (reg:SI 195 [ a ])
>>             (const_int 8 [0x8])
>>             (const_int 12 [0xc]))
>>         (reg:SI 196)) 214 {insvsi} (expr_list:REG_EQUAL (const_int -1
>> [0xffffffffffffffff])
>>         (nil)))
>>
>> It does the correct thing of replacing it with:
>> (set (reg:SI 195 [ a ])
>>     (ior:SI (reg:SI 4 $4 [ a ])
>>         (const_int 1044480 [0xff000])))
>> But then it does remove the reg_equal from the instruction which
>> causes us to get:
>> (set (reg/i:DI 2 $2)
>>     (const_int -1 [0xffffffffffffffff]))
>
> Hi Andrew, thanks for testing and finding this. Do you have a testcase I
> can use?


It is hard to reproduce on the trunk as I have some changes in the
mips back-end.  Though we might get the same RTL doing something like:
struct a
{
  long long a:32;
  long long b:8;
  long long c:24;
};
struct a f(struct a a)
{
  a.b=-1;
  return a;
}

Thanks,
Andrew Pinski
Andrew Pinski - Dec. 14, 2010, 8:16 p.m.
On Mon, Dec 13, 2010 at 6:16 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> It is hard to reproduce on the trunk as I have some changes in the
> mips back-end.  Though we might get the same RTL doing something like:
> struct a
> {
>  long long a:32;
>  long long b:8;
>  long long c:24;
> };
> struct a f(struct a a)
> {
>  a.b=-1;
>  return a;
> }

I forgot to mention this is on MIPS64-linux-gnu; well MIPS64r2 really.

-- Pinski

Patch

Index: combine.c
===================================================================
--- combine.c	(revision 167647)
+++ combine.c	(working copy)
@@ -2030,7 +2030,22 @@ 
 					GET_MODE (inner_dest))))
 	  || (i1_not_in_src && reg_overlap_mentioned_p (i1dest, src))
 	  || (i0_not_in_src && reg_overlap_mentioned_p (i0dest, src)))
-	return 0;
+	{
+	  /* Try again, this time expanding the modifying pattern into
+	     an and/ior/shift expression. */
+	  if (flag_expensive_optimizations)
+	    {
+	      rtx exppat = expand_field_assignment (x);
+	      if (GET_CODE (SET_DEST (exppat)) != GET_CODE (SET_DEST (x)))
+		{
+		  SUBST (*loc, exppat);
+		  return combinable_i3pat (i3, loc, i2dest, i1dest, i0dest,
+					   i1_not_in_src, i0_not_in_src,
+					   pi3dest_killed);
+		}
+	    }
+	  return 0;
+	}
 
       /* If DEST is used in I3, it is being killed in this insn, so
 	 record that for later.  We have to consider paradoxical