Patchwork The arm patch 161344 to transform TST into LSLS

login
register
mail settings
Submitter Carrot Wei
Date July 2, 2010, 10:06 p.m.
Message ID <AANLkTikKQrpf_o4rkyiH5NXnUsHC-E-0eiQftXcItR3s@mail.gmail.com>
Download mbox | patch
Permalink /patch/57784/
State New
Headers show

Comments

Carrot Wei - July 2, 2010, 10:06 p.m.
Hi Richard

The following patch has been tested on arm qemu. Could you take a look?

thanks
Guozhi

ChangeLog:
2010-07-02  Wei Guozhi  <carrot@google.com>

        * thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
        to lshift/compare): New.



On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>> Hi Richard
>>
>> The new peephole2 and the old pattern does different optimization. As
>> you have described the peephole2 can optimize the cases that test a
>> single bit in a word. But the old pattern tests if the bit fields at
>> the low end of a word is equal or not equal to zero, the bit field may
>> contain more than 1 bit. Interestingly the test case with the old
>> pattern can fit in both situations. If we change the test case as
>> following, it can show the regression.
>>
>> struct A
>> {
>>   int v:2;
>> };
>>
>>
>> int bar();
>> int foo(struct A* p)
>> {
>>   if (p->v)
>>     return 1;
>>   return bar();
>> }
>>
>> So we need another peephole2 to bring that optimization back.
>>
>> thanks
>> Guozhi
>
> Yes, a peep2 for that should be pretty straight-forward to generate.
> Simply transform the code into a left-shift and compare with 0, then a
> branch if eq/ne.
>
> R.
>
>
Richard Earnshaw - July 3, 2010, 9:18 a.m.
On 02/07/10 23:06, Carrot Wei wrote:
> Hi Richard
>
> The following patch has been tested on arm qemu. Could you take a look?
>
> thanks
> Guozhi
>
> ChangeLog:
> 2010-07-02  Wei Guozhi<carrot@google.com>
>
>          * thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
>          to lshift/compare): New.
>

This is basically fine.  But one minor nit.

A lsl ...,#0 in Thumb1 isn't a left shift at all, but a movs (and a 
picky assembler might reject the construct, as some versions of the ARM 
ARM state that the range of the shift should be in the range 1...31). 
Fortunately, I doubt this pattern would ever be generated for that case, 
since a zero_extract of all 32-bits of an SImode value would simplify 
into the original operand.

So please change the limit on op2 to be less than 32.

OK with that change.

R.
>
> Index: thumb2.md
> ===================================================================
> --- thumb2.md   (revision 161725)
> +++ thumb2.md   (working copy)
> @@ -1501,3 +1501,29 @@
>                                  VOIDmode, operands[0], const0_rtx);
>     ")
>
> +(define_peephole2
> +  [(set (match_operand:CC_NOOV 0 "cc_register" "")
> +       (compare:CC_NOOV (zero_extract:SI
> +                         (match_operand:SI 1 "low_register_operand" "")
> +                         (match_operand:SI 2 "const_int_operand" "")
> +                         (const_int 0))
> +                        (const_int 0)))
> +   (match_scratch:SI 3 "l")
> +   (set (pc)
> +       (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
> +                      [(match_dup 0) (const_int 0)])
> +                     (match_operand 5 "" "")
> +                     (match_operand 6 "" "")))]
> +  "TARGET_THUMB2
> +&&  (INTVAL (operands[2])>  0&&  INTVAL (operands[2])<= 32)"
> +  [(parallel [(set (match_dup 0)
> +                  (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
> +                                   (const_int 0)))
> +             (clobber (match_dup 3))])
> +   (set (pc)
> +       (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
> +                     (match_dup 5) (match_dup 6)))]
> +  "
> +  operands[2] = GEN_INT (32 - INTVAL (operands[2]));
> +  ")
> +
>
> On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw<rearnsha@arm.com>  wrote:
>>
>> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>>> Hi Richard
>>>
>>> The new peephole2 and the old pattern does different optimization. As
>>> you have described the peephole2 can optimize the cases that test a
>>> single bit in a word. But the old pattern tests if the bit fields at
>>> the low end of a word is equal or not equal to zero, the bit field may
>>> contain more than 1 bit. Interestingly the test case with the old
>>> pattern can fit in both situations. If we change the test case as
>>> following, it can show the regression.
>>>
>>> struct A
>>> {
>>>    int v:2;
>>> };
>>>
>>>
>>> int bar();
>>> int foo(struct A* p)
>>> {
>>>    if (p->v)
>>>      return 1;
>>>    return bar();
>>> }
>>>
>>> So we need another peephole2 to bring that optimization back.
>>>
>>> thanks
>>> Guozhi
>>
>> Yes, a peep2 for that should be pretty straight-forward to generate.
>> Simply transform the code into a left-shift and compare with 0, then a
>> branch if eq/ne.
>>
>> R.
>>
>>
>
>

Patch

Index: thumb2.md
===================================================================
--- thumb2.md   (revision 161725)
+++ thumb2.md   (working copy)
@@ -1501,3 +1501,29 @@ 
                                VOIDmode, operands[0], const0_rtx);
   ")

+(define_peephole2
+  [(set (match_operand:CC_NOOV 0 "cc_register" "")
+       (compare:CC_NOOV (zero_extract:SI
+                         (match_operand:SI 1 "low_register_operand" "")
+                         (match_operand:SI 2 "const_int_operand" "")
+                         (const_int 0))
+                        (const_int 0)))
+   (match_scratch:SI 3 "l")
+   (set (pc)
+       (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
+                      [(match_dup 0) (const_int 0)])
+                     (match_operand 5 "" "")
+                     (match_operand 6 "" "")))]
+  "TARGET_THUMB2
+   && (INTVAL (operands[2]) > 0 && INTVAL (operands[2]) <= 32)"
+  [(parallel [(set (match_dup 0)
+                  (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
+                                   (const_int 0)))
+             (clobber (match_dup 3))])
+   (set (pc)
+       (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
+                     (match_dup 5) (match_dup 6)))]
+  "
+  operands[2] = GEN_INT (32 - INTVAL (operands[2]));
+  ")
+