Message ID | CAEe8uEAabHqj0zgxNiR5VbsWuDN56Sk4ckaRuzMygj51nbQ5Qw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote: > On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan > <ramana.radhakrishnan@linaro.org> wrote: >> Carrot, >> >> Sorry about the delayed response. >> >> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote: >>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan >>> <ramana.radhakrishnan@linaro.org> wrote: >>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote: >>>>> Hi >>>>> >>>>> This is the second part of the patches that deals with 64bit and. It directly >>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit >>>>> constant operands. >>>>> >>>> >>>> Comments about const_di_ok_for_op still apply from my review of your add patch. >>>> >>>> However I don't see why and /ior / xor with constants that have either >>>> the low or high parts set can't be expanded directly into ands of >>>> subregs with moves of zero's or the original value especially if you >>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>> used for 64 bit arithmetic it gets more interesting. >>>> >>>> Finally this should target PR target/53189. >>>> >>> >>> Hi Ramana >>> >>> Thanks for the review. Following is the updated patch according to >>> your comments. >> >> You've missed answering this part of my review :) >> >>>> However I don't see why and /ior / xor with constants that have either >>>> the low or high parts set can't be expanded directly into ands of >>>> subregs with moves of zero's or the original value especially if you >>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>> used for 64 bit arithmetic it gets more interesting. >> > It has been handled by the const_ok_for_dimode_op and the output part > of corresponding SI mode insn. Let's take the IOR case as an example. > I noticed that - If I wasn't clear enough, the question was more towards generating a subreg move at expand time rather than a split and handling while outputting asm if you see what I mean. regards, Ramana
On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> wrote: > On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote: >> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan >> <ramana.radhakrishnan@linaro.org> wrote: >>> Carrot, >>> >>> Sorry about the delayed response. >>> >>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote: >>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan >>>> <ramana.radhakrishnan@linaro.org> wrote: >>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote: >>>>>> Hi >>>>>> >>>>>> This is the second part of the patches that deals with 64bit and. It directly >>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit >>>>>> constant operands. >>>>>> >>>>> >>>>> Comments about const_di_ok_for_op still apply from my review of your add patch. >>>>> >>>>> However I don't see why and /ior / xor with constants that have either >>>>> the low or high parts set can't be expanded directly into ands of >>>>> subregs with moves of zero's or the original value especially if you >>>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>>> used for 64 bit arithmetic it gets more interesting. >>>>> >>>>> Finally this should target PR target/53189. >>>>> >>>> >>>> Hi Ramana >>>> >>>> Thanks for the review. Following is the updated patch according to >>>> your comments. >>> >>> You've missed answering this part of my review :) >>> >>>>> However I don't see why and /ior / xor with constants that have either >>>>> the low or high parts set can't be expanded directly into ands of >>>>> subregs with moves of zero's or the original value especially if you >>>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>>> used for 64 bit arithmetic it gets more interesting. >>> >> It has been handled by the const_ok_for_dimode_op and the output part >> of corresponding SI mode insn. Let's take the IOR case as an example. >> > > I noticed that - If I wasn't clear enough, the question was more > towards generating a subreg move at expand time rather than a split > and handling while outputting asm if you see what I mean. > I see your point now. I don't know how much better if we handle it earlier. Even if I generates subreg move for non-neon code at expand time, the latter output handling is still necessary for neon insns. Do you think it deserves the extra expand handling? thanks Carrot
On 18 July 2012 11:12, Carrot Wei <carrot@google.com> wrote: > On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan > <ramana.radhakrishnan@linaro.org> wrote: >> On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote: >>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan >>> <ramana.radhakrishnan@linaro.org> wrote: >>>> Carrot, >>>> >>>> Sorry about the delayed response. >>>> >>>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote: >>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan >>>>> <ramana.radhakrishnan@linaro.org> wrote: >>>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote: >>>>>>> Hi >>>>>>> >>>>>>> This is the second part of the patches that deals with 64bit and. It directly >>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit >>>>>>> constant operands. >>>>>>> >>>>>> >>>>>> Comments about const_di_ok_for_op still apply from my review of your add patch. >>>>>> >>>>>> However I don't see why and /ior / xor with constants that have either >>>>>> the low or high parts set can't be expanded directly into ands of >>>>>> subregs with moves of zero's or the original value especially if you >>>>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>>>> used for 64 bit arithmetic it gets more interesting. >>>>>> >>>>>> Finally this should target PR target/53189. >>>>>> >>>>> >>>>> Hi Ramana >>>>> >>>>> Thanks for the review. Following is the updated patch according to >>>>> your comments. >>>> >>>> You've missed answering this part of my review :) >>>> >>>>>> However I don't see why and /ior / xor with constants that have either >>>>>> the low or high parts set can't be expanded directly into ands of >>>>>> subregs with moves of zero's or the original value especially if you >>>>>> aren't looking at doing 64 bit operations in neon .With Neon being >>>>>> used for 64 bit arithmetic it gets more interesting. >>>> >>> It has been handled by the const_ok_for_dimode_op and the output part >>> of corresponding SI mode insn. Let's take the IOR case as an example. >>> >> >> I noticed that - If I wasn't clear enough, the question was more >> towards generating a subreg move at expand time rather than a split >> and handling while outputting asm if you see what I mean. >> > I see your point now. I don't know how much better if we handle it > earlier. Even if I generates subreg move for non-neon code at expand > time, the latter output handling is still necessary for neon insns. Just a quick note to let you know that I had a look this week and I'd rather have the splitters deal with the idioms properly rather than change the SImode patterns to deal with mov's . In theory with the proper idiom type splitting dead loads can disappear and a number of your tests that do and with the upper 32 bits as 0 don't really need a load from memory and so on. I'd rather have it split before reload in that form if possible. The neon case is something I don't yet have an answer to but I'm gonna take a look. regards, ramana
--- arm.c (revision 189278) +++ arm.c (working copy) @@ -2524,6 +2524,16 @@ case PLUS: return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode); + case IOR: + if ((const_ok_for_arm (hi_val) || (hi_val == 0xFFFFFFFF)) + && (const_ok_for_arm (lo_val) || (lo_val == 0xFFFFFFFF))) + return 1; + if (TARGET_THUMB2 + && (const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val)) + && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val))) + return 1; + return 0; + default: return 0; } The 0xFFFFFFFF is not valid arm mode immediate, but ior 0XFFFFFFFF results in all 1's, so it is also allowed in an iordi3 insn. And the patch in iorsi3_insn pattern explicitly check the all 0's and all 1's cases, and output either a simple register mov instruction or instruction mov all 1's to the destination. @@ -2902,10 +2915,29 @@ (ior:SI (match_operand:SI 1 "s_register_operand" "%r,r,r") (match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))] "TARGET_32BIT" - "@ - orr%?\\t%0, %1, %2 - orn%?\\t%0, %1, #%B2 - #" + "* + { + if (CONST_INT_P (operands[2])) + { + HOST_WIDE_INT i = INTVAL (operands[2]) & 0xFFFFFFFF; + if (i == 0xFFFFFFFF) + return \"mvn%?\\t%0, #0\"; + if (i == 0) + { + if (!rtx_equal_p (operands[0], operands[1])) + return \"mov%?\\t%0, %1\"; + else + return \"\"; + } + } + + switch (which_alternative) + { + case 0: return \"orr%?\\t%0, %1, %2\"; + case 1: return \"orn%?\\t%0, %1, #%B2\"; + case 2: return \"#\"; + } + }" "TARGET_32BIT && GET_CODE (operands[2]) == CONST_INT && !(const_ok_for_arm (INTVAL (operands[2])) > Is there any reason why we don't split such cases earlier into the > constituent moves and the associated ands earlier than reload in the > non-Neon case? > I referenced pattern arm_adddi3 which is split after reload_completed. And the pattern arm_subdi3 is even not split. I guess they keep the original constant longer may benefit some optimizations involving constants. But it may also lose flexibility in other cases. > In addition, it would be good to have some tests for Thumb2 that deal > with the replicated constants for Thumb2 . Can you have a look at > creating some tests similar to the thumb2*replicated*.c tests in > gcc.target/arm but for 64 bit constants ? > The new test cases involving thumb2 replicated constants are added as following. thanks Carrot 2012-07-18 Wei Guozhi <carrot@google.com> PR target/53189 * gcc.target/arm/pr53189-10.c: New testcase. * gcc.target/arm/pr53189-11.c: New testcase. * gcc.target/arm/pr53189-12.c: New testcase. Index: pr53189-10.c =================================================================== --- pr53189-10.c (revision 0) +++ pr53189-10.c (revision 0) @@ -0,0 +1,9 @@ +/* { dg-options "-mthumb -O2" } */ +/* { dg-require-effective-target arm_thumb2_ok } */ +/* { dg-final { scan-assembler-not "mov" } } */ +/* { dg-final { scan-assembler "and.*#-16843010" } } */ + +void t0p(long long * p) +{ + *p &= 0x9fefefefe; +} Index: pr53189-11.c =================================================================== --- pr53189-11.c (revision 0) +++ pr53189-11.c (revision 0) @@ -0,0 +1,9 @@ +/* { dg-options "-mthumb -O2" } */ +/* { dg-require-effective-target arm_thumb2_ok } */ +/* { dg-final { scan-assembler-not "mov" } } */ +/* { dg-final { scan-assembler "eor.*#-1426019584" } } */ + +void t0p(long long * p) +{ + *p ^= 0x7ab00ab00; +} Index: pr53189-12.c =================================================================== --- pr53189-12.c (revision 0) +++ pr53189-12.c (revision 0) @@ -0,0 +1,9 @@ +/* { dg-options "-mthumb -O2" } */ +/* { dg-require-effective-target arm_thumb2_ok } */ +/* { dg-final { scan-assembler-not "mov" } } */ +/* { dg-final { scan-assembler "orr.*#13435085" } } */ + +void t0p(long long * p) +{ + *p |= 0x500cd00cd; +}