[ARM,1/3] PR53189: optimizations of 64bit logic operation with constant

Submitted by Carrot Wei on July 18, 2012, 8:20 a.m.

Details

Message ID CAEe8uEAabHqj0zgxNiR5VbsWuDN56Sk4ckaRuzMygj51nbQ5Qw@mail.gmail.com
State New
Headers show

Commit Message

Carrot Wei July 18, 2012, 8:20 a.m.
On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Carrot,
>
> Sorry about the delayed response.
>
> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>> <ramana.radhakrishnan@linaro.org> wrote:
>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>> Hi
>>>>
>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>> constant operands.
>>>>
>>>
>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>>>
>>> Finally this should target PR target/53189.
>>>
>>
>> Hi Ramana
>>
>> Thanks for the review. Following is the updated patch according to
>> your comments.
>
> You've missed answering this part of my review :)
>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>
It has been handled by the const_ok_for_dimode_op and the output part
of corresponding SI mode insn. Let's take the IOR case as an example.

In the const_ok_for_dimode_op patch

Comments

Ramana Radhakrishnan July 18, 2012, 9:39 a.m.
On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> Carrot,
>>
>> Sorry about the delayed response.
>>
>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>> Hi
>>>>>
>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>> constant operands.
>>>>>
>>>>
>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>
>>>> However I don't see why and /ior / xor with constants that have either
>>>> the low or high parts set can't be expanded directly into ands of
>>>> subregs with moves of zero's or the original value especially if you
>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>> used for 64 bit arithmetic it gets more interesting.
>>>>
>>>> Finally this should target PR target/53189.
>>>>
>>>
>>> Hi Ramana
>>>
>>> Thanks for the review. Following is the updated patch according to
>>> your comments.
>>
>> You've missed answering this part of my review :)
>>
>>>> However I don't see why and /ior / xor with constants that have either
>>>> the low or high parts set can't be expanded directly into ands of
>>>> subregs with moves of zero's or the original value especially if you
>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>> used for 64 bit arithmetic it gets more interesting.
>>
> It has been handled by the const_ok_for_dimode_op and the output part
> of corresponding SI mode insn. Let's take the IOR case as an example.
>

I noticed that - If I wasn't clear enough, the question was more
towards generating a subreg move at expand time rather than a split
and handling while outputting asm if you see what I mean.

regards,
Ramana
Carrot Wei July 18, 2012, 10:12 a.m.
On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
>> <ramana.radhakrishnan@linaro.org> wrote:
>>> Carrot,
>>>
>>> Sorry about the delayed response.
>>>
>>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>>> Hi
>>>>>>
>>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>>> constant operands.
>>>>>>
>>>>>
>>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>>
>>>>> Finally this should target PR target/53189.
>>>>>
>>>>
>>>> Hi Ramana
>>>>
>>>> Thanks for the review. Following is the updated patch according to
>>>> your comments.
>>>
>>> You've missed answering this part of my review :)
>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>
>> It has been handled by the const_ok_for_dimode_op and the output part
>> of corresponding SI mode insn. Let's take the IOR case as an example.
>>
>
> I noticed that - If I wasn't clear enough, the question was more
> towards generating a subreg move at expand time rather than a split
> and handling while outputting asm if you see what I mean.
>
I see your point now. I don't know how much better if we handle it
earlier. Even if I generates subreg move for non-neon code at expand
time, the latter output handling is still necessary for neon insns. Do
you think it deserves the extra expand handling?

thanks
Carrot
Ramana Radhakrishnan Aug. 3, 2012, 6:57 p.m.
On 18 July 2012 11:12, Carrot Wei <carrot@google.com> wrote:
> On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
>>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>> Carrot,
>>>>
>>>> Sorry about the delayed response.
>>>>
>>>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>>>> constant operands.
>>>>>>>
>>>>>>
>>>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>>>
>>>>>> However I don't see why and /ior / xor with constants that have either
>>>>>> the low or high parts set can't be expanded directly into ands of
>>>>>> subregs with moves of zero's or the original value especially if you
>>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>>>
>>>>>> Finally this should target PR target/53189.
>>>>>>
>>>>>
>>>>> Hi Ramana
>>>>>
>>>>> Thanks for the review. Following is the updated patch according to
>>>>> your comments.
>>>>
>>>> You've missed answering this part of my review :)
>>>>
>>>>>> However I don't see why and /ior / xor with constants that have either
>>>>>> the low or high parts set can't be expanded directly into ands of
>>>>>> subregs with moves of zero's or the original value especially if you
>>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>
>>> It has been handled by the const_ok_for_dimode_op and the output part
>>> of corresponding SI mode insn. Let's take the IOR case as an example.
>>>
>>
>> I noticed that - If I wasn't clear enough, the question was more
>> towards generating a subreg move at expand time rather than a split
>> and handling while outputting asm if you see what I mean.
>>
> I see your point now. I don't know how much better if we handle it
> earlier. Even if I generates subreg move for non-neon code at expand
> time, the latter output handling is still necessary for neon insns.

Just a quick note to let you know that I had a look this week and I'd
rather have the splitters deal with the idioms properly rather than
change the SImode patterns to deal with mov's . In theory with the
proper idiom type splitting dead loads can disappear and a number of
your tests that do and with the upper 32 bits as 0 don't really need a
load from memory and so on. I'd rather have it split before reload in
that form if possible. The neon case is something I don't yet have an
answer to but I'm gonna take a look.



regards,
ramana

Patch hide | download patch | download mbox

--- arm.c	(revision 189278)
+++ arm.c	(working copy)
@@ -2524,6 +2524,16 @@ 
     case PLUS:
       return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);

+    case IOR:
+      if ((const_ok_for_arm (hi_val) || (hi_val == 0xFFFFFFFF))
+	  && (const_ok_for_arm (lo_val) || (lo_val == 0xFFFFFFFF)))
+	return 1;
+      if (TARGET_THUMB2
+	  && (const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val))
+	  && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val)))
+	return 1;
+      return 0;
+
     default:
       return 0;
     }

The 0xFFFFFFFF is not valid arm mode immediate, but ior 0XFFFFFFFF
results in all 1's, so it is also allowed in an iordi3 insn. And the
patch in iorsi3_insn pattern explicitly check the all 0's and all 1's
cases, and output either a simple register mov instruction or
instruction mov all 1's to the destination.

@@ -2902,10 +2915,29 @@ 
 	(ior:SI (match_operand:SI 1 "s_register_operand" "%r,r,r")
 		(match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))]
   "TARGET_32BIT"
-  "@
-   orr%?\\t%0, %1, %2
-   orn%?\\t%0, %1, #%B2
-   #"
+  "*
+  {
+    if (CONST_INT_P (operands[2]))
+      {
+	HOST_WIDE_INT i = INTVAL (operands[2]) & 0xFFFFFFFF;
+	if (i == 0xFFFFFFFF)
+	  return \"mvn%?\\t%0, #0\";
+	if (i == 0)
+	  {
+	    if (!rtx_equal_p (operands[0], operands[1]))
+	      return \"mov%?\\t%0, %1\";
+	    else
+	      return \"\";
+	  }
+      }
+
+    switch (which_alternative)
+      {
+      case 0: return \"orr%?\\t%0, %1, %2\";
+      case 1: return \"orn%?\\t%0, %1, #%B2\";
+      case 2: return \"#\";
+      }
+  }"
   "TARGET_32BIT
    && GET_CODE (operands[2]) == CONST_INT
    && !(const_ok_for_arm (INTVAL (operands[2]))


> Is there any reason why we don't split such cases earlier into the
> constituent moves and the associated ands earlier than reload in the
> non-Neon case?
>
I referenced pattern arm_adddi3 which is split after reload_completed.
And the pattern arm_subdi3 is even not split. I guess they keep the
original constant longer may benefit some optimizations involving
constants. But it may also lose flexibility in other cases.

>  In addition, it would be good to have some tests for Thumb2 that deal
> with the replicated constants for Thumb2 . Can you have a look at
> creating some tests similar to the thumb2*replicated*.c tests in
> gcc.target/arm but for 64 bit constants ?
>

The new test cases involving thumb2 replicated constants are added as following.

thanks
Carrot



2012-07-18  Wei Guozhi  <carrot@google.com>

	PR target/53189
	* gcc.target/arm/pr53189-10.c: New testcase.
	* gcc.target/arm/pr53189-11.c: New testcase.
	* gcc.target/arm/pr53189-12.c: New testcase.



Index: pr53189-10.c
===================================================================
--- pr53189-10.c	(revision 0)
+++ pr53189-10.c	(revision 0)
@@ -0,0 +1,9 @@ 
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "and.*#-16843010" } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x9fefefefe;
+}


Index: pr53189-11.c
===================================================================
--- pr53189-11.c	(revision 0)
+++ pr53189-11.c	(revision 0)
@@ -0,0 +1,9 @@ 
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "eor.*#-1426019584" } } */
+
+void t0p(long long * p)
+{
+  *p ^= 0x7ab00ab00;
+}


Index: pr53189-12.c
===================================================================
--- pr53189-12.c	(revision 0)
+++ pr53189-12.c	(revision 0)
@@ -0,0 +1,9 @@ 
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "orr.*#13435085" } } */
+
+void t0p(long long * p)
+{
+  *p |= 0x500cd00cd;
+}