Patchwork [PATCH:,PR,target/46631] Change operands order so we can use 16bit AND instead of 32bit in thumb2

login
register
mail settings
Submitter Carrot Wei
Date Dec. 1, 2010, 7:12 p.m.
Message ID <AANLkTi=kW=5xPhdPMZ06Lk3gOJ_w43ifAJWWCgX9Upft@mail.gmail.com>
Download mbox | patch
Permalink /patch/73877/
State New
Headers show

Comments

Carrot Wei - Dec. 1, 2010, 7:12 p.m.
On Wed, Dec 1, 2010 at 4:04 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Tue, 2010-11-30 at 15:21 -0800, Carrot Wei wrote:
>> Hi
>>
>> An instruction of the following style is 32 bit in thumb2.
>>     and    r2, r3, r2
>>
>> If we change the source operands order, it will be 16 bit.
>>     ands    r2, r2, r3
>>
>> This patch contains a new peephole2 to detect the situation that the all 3
>> operands of AND are low registers, and the target is the same as the second
>> source, then replace it with another AND with its source operands exchanged.
>>
>> This patch passed the regression test on arm qemu.
>>
>
> So gas will already generate a 16-bit instruction from
>        ands    r2, r3, r2
>
> So it should be possible to just extend the existing peephole/split to
> handle this case (and for all other commutative operations).
>

Do you mean I should do the following modification to pattern
"*arm_andsi3_insn" ?


And do the similar to patterns "*iorsi3_insn" and "*arm_xorsi3"?

thanks
Guozhi
Richard Earnshaw - Dec. 2, 2010, 12:26 a.m.
Absolutely not!  That pattern does not clobber the condition codes. Look for the existing pattern that generates a two register and operation for thumb2 and update that. 

R. 

On 1 Dec 2010, at 19:12, "Carrot Wei" <carrot@google.com> wrote:

> On Wed, Dec 1, 2010 at 4:04 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> 
>> On Tue, 2010-11-30 at 15:21 -0800, Carrot Wei wrote:
>>> Hi
>>> 
>>> An instruction of the following style is 32 bit in thumb2.
>>>    and    r2, r3, r2
>>> 
>>> If we change the source operands order, it will be 16 bit.
>>>    ands    r2, r2, r3
>>> 
>>> This patch contains a new peephole2 to detect the situation that the all 3
>>> operands of AND are low registers, and the target is the same as the second
>>> source, then replace it with another AND with its source operands exchanged.
>>> 
>>> This patch passed the regression test on arm qemu.
>>> 
>> 
>> So gas will already generate a 16-bit instruction from
>>       ands    r2, r3, r2
>> 
>> So it should be possible to just extend the existing peephole/split to
>> handle this case (and for all other commutative operations).
>> 
> 
> Do you mean I should do the following modification to pattern
> "*arm_andsi3_insn" ?
> 
> Index: arm.md
> ===================================================================
> --- arm.md    (revision 167257)
> +++ arm.md    (working copy)
> @@ -2094,15 +2094,26 @@
>   and%?\\t%0, %1, %2
>   bic%?\\t%0, %1, #%B2
>   #"
> -  "TARGET_32BIT
> -   && GET_CODE (operands[2]) == CONST_INT
> -   && !(const_ok_for_arm (INTVAL (operands[2]))
> -    || const_ok_for_arm (~INTVAL (operands[2])))"
> -  [(clobber (const_int 0))]
> +  "TARGET_32BIT"
> +  [(parallel
> +    [(set (match_dup 0)
> +          (and:SI (match_dup 0)
> +                  (match_dup 1)))
> +     (clobber (reg:CC CC_REGNUM))])]
>  "
> -  arm_split_constant  (AND, SImode, curr_insn,
> -                   INTVAL (operands[2]), operands[0], operands[1], 0);
> -  DONE;
> +  if (GET_CODE (operands[2]) == CONST_INT
> +      && !(const_ok_for_arm (INTVAL (operands[2]))
> +           || const_ok_for_arm (~INTVAL (operands[2])))"
> +    {
> +      arm_split_constant  (AND, SImode, curr_insn,
> +                       INTVAL (operands[2]), operands[0], operands[1], 0);
> +      DONE;
> +    }
> +  else if (TARGET_THUMB2 && rtx_equal_p (operands[0], operands[2])
> +           && peep2_regno_dead_p(0, CC_REGNUM)
> +    {
> +    }
> +  FAIL;
>  "
>  [(set_attr "length" "4,4,16")
>   (set_attr "predicable" "yes")]
> 
> And do the similar to patterns "*iorsi3_insn" and "*arm_xorsi3"?
> 
> thanks
> Guozhi
Carrot Wei - Dec. 2, 2010, 1:54 a.m.
So you mean pattern "*andsi3_compare0"?

It doesn't work since in the original instruction "and    r2, r3, r2",
it also doesn't clobber the condition code. So we must add the clobber
explicitely. It looks peephole2 is still the best solution.

thanks
Guozhi

On Wed, Dec 1, 2010 at 4:26 PM, Richard Earnshaw
<richard.earnshaw@buzzard.freeserve.co.uk> wrote:
> Absolutely not!  That pattern does not clobber the condition codes. Look for the existing pattern that generates a two register and operation for thumb2 and update that.
>
> R.
>
> On 1 Dec 2010, at 19:12, "Carrot Wei" <carrot@google.com> wrote:
>
>> On Wed, Dec 1, 2010 at 4:04 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>
>>> On Tue, 2010-11-30 at 15:21 -0800, Carrot Wei wrote:
>>>> Hi
>>>>
>>>> An instruction of the following style is 32 bit in thumb2.
>>>>    and    r2, r3, r2
>>>>
>>>> If we change the source operands order, it will be 16 bit.
>>>>    ands    r2, r2, r3
>>>>
>>>> This patch contains a new peephole2 to detect the situation that the all 3
>>>> operands of AND are low registers, and the target is the same as the second
>>>> source, then replace it with another AND with its source operands exchanged.
>>>>
>>>> This patch passed the regression test on arm qemu.
>>>>
>>>
>>> So gas will already generate a 16-bit instruction from
>>>       ands    r2, r3, r2
>>>
>>> So it should be possible to just extend the existing peephole/split to
>>> handle this case (and for all other commutative operations).
>>>
>>
>> Do you mean I should do the following modification to pattern
>> "*arm_andsi3_insn" ?
>>
>> Index: arm.md
>> ===================================================================
>> --- arm.md    (revision 167257)
>> +++ arm.md    (working copy)
>> @@ -2094,15 +2094,26 @@
>>   and%?\\t%0, %1, %2
>>   bic%?\\t%0, %1, #%B2
>>   #"
>> -  "TARGET_32BIT
>> -   && GET_CODE (operands[2]) == CONST_INT
>> -   && !(const_ok_for_arm (INTVAL (operands[2]))
>> -    || const_ok_for_arm (~INTVAL (operands[2])))"
>> -  [(clobber (const_int 0))]
>> +  "TARGET_32BIT"
>> +  [(parallel
>> +    [(set (match_dup 0)
>> +          (and:SI (match_dup 0)
>> +                  (match_dup 1)))
>> +     (clobber (reg:CC CC_REGNUM))])]
>>  "
>> -  arm_split_constant  (AND, SImode, curr_insn,
>> -                   INTVAL (operands[2]), operands[0], operands[1], 0);
>> -  DONE;
>> +  if (GET_CODE (operands[2]) == CONST_INT
>> +      && !(const_ok_for_arm (INTVAL (operands[2]))
>> +           || const_ok_for_arm (~INTVAL (operands[2])))"
>> +    {
>> +      arm_split_constant  (AND, SImode, curr_insn,
>> +                       INTVAL (operands[2]), operands[0], operands[1], 0);
>> +      DONE;
>> +    }
>> +  else if (TARGET_THUMB2 && rtx_equal_p (operands[0], operands[2])
>> +           && peep2_regno_dead_p(0, CC_REGNUM)
>> +    {
>> +    }
>> +  FAIL;
>>  "
>>  [(set_attr "length" "4,4,16")
>>   (set_attr "predicable" "yes")]
>>
>> And do the similar to patterns "*iorsi3_insn" and "*arm_xorsi3"?
>>
>> thanks
>> Guozhi
>
>
>
>

Patch

Index: arm.md
===================================================================
--- arm.md	(revision 167257)
+++ arm.md	(working copy)
@@ -2094,15 +2094,26 @@ 
    and%?\\t%0, %1, %2
    bic%?\\t%0, %1, #%B2
    #"
-  "TARGET_32BIT
-   && GET_CODE (operands[2]) == CONST_INT
-   && !(const_ok_for_arm (INTVAL (operands[2]))
-	|| const_ok_for_arm (~INTVAL (operands[2])))"
-  [(clobber (const_int 0))]
+  "TARGET_32BIT"
+  [(parallel
+    [(set (match_dup 0)
+          (and:SI (match_dup 0)
+                  (match_dup 1)))
+     (clobber (reg:CC CC_REGNUM))])]
   "
-  arm_split_constant  (AND, SImode, curr_insn,
-	               INTVAL (operands[2]), operands[0], operands[1], 0);
-  DONE;
+  if (GET_CODE (operands[2]) == CONST_INT
+      && !(const_ok_for_arm (INTVAL (operands[2]))
+           || const_ok_for_arm (~INTVAL (operands[2])))"
+    {
+      arm_split_constant  (AND, SImode, curr_insn,
+	                   INTVAL (operands[2]), operands[0], operands[1], 0);
+      DONE;
+    }
+  else if (TARGET_THUMB2 && rtx_equal_p (operands[0], operands[2])
+           && peep2_regno_dead_p(0, CC_REGNUM)
+    {
+    }
+  FAIL;
   "
   [(set_attr "length" "4,4,16")
    (set_attr "predicable" "yes")]