Message ID | CAKdteOabZUzYqGDWCKuN_pU=pYTZa5eDcgOv6PReiEA4SHJHsg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 10/06/16 14:39, Christophe Lyon wrote: > On 10 June 2016 at 10:38, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: >> On 09/06/16 13:14, Christophe Lyon wrote: >>> On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> >>> wrote: >>>> On 07/06/16 20:34, Christophe Lyon wrote: >>>>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> In this PR we want to optimise: >>>>>> int foo (int i) >>>>>> { >>>>>> return (i == 0) ? N : __builtin_clz (i); >>>>>> } >>>>>> >>>>>> on targets where CLZ is defined at zero to the constant 'N'. >>>>>> This is determined at the RTL level through the >>>>>> CLZ_DEFINED_VALUE_AT_ZERO >>>>>> macro. >>>>>> The obvious place to implement this would be in combine through >>>>>> simplify-rtx >>>>>> where we'd >>>>>> recognise an IF_THEN_ELSE of the form: >>>>>> (set (reg:SI r1) >>>>>> (if_then_else:SI (ne (reg:SI r2) >>>>>> (const_int 0 [0])) >>>>>> (clz:SI (reg:SI r2)) >>>>>> (const_int 32))) >>>>>> >>>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd >>>>>> simplify >>>>>> it into >>>>>> just (clz:SI (reg:SI r2)). >>>>>> However, I found this doesn't quite happen for a couple of reasons: >>>>>> 1) This depends on ifcvt or some other pass to have created a >>>>>> conditional >>>>>> move of the >>>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int >>>>>> and >>>>>> clz operation into. >>>>>> >>>>>> 2) Combine will refuse to propagate r2 from the above example into both >>>>>> the >>>>>> condition and the >>>>>> CLZ at the same time, so the most we see is: >>>>>> (set (reg:SI r1) >>>>>> (if_then_else:SI (ne (reg:CC cc) >>>>>> (const_int 0)) >>>>>> (clz:SI (reg:SI r2)) >>>>>> (const_int 32))) >>>>>> >>>>>> which is not enough information to perform the simplification. >>>>>> >>>>>> This patch implements the optimisation in ce1 using the noce ifcvt >>>>>> framework. >>>>>> During ifcvt noce_process_if_block can see that we're trying to >>>>>> optimise >>>>>> something >>>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of >>>>>> all >>>>>> the information >>>>>> needed to perform the transformation. >>>>>> >>>>>> The transformation is performed by adding a new noce_try* function that >>>>>> tries to put the >>>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and >>>>>> try >>>>>> to >>>>>> simplify that >>>>>> using the simplify-rtx machinery. That way, we can implement the >>>>>> simplification logic in >>>>>> simplify-rtx.c where it belongs. >>>>>> >>>>>> A similar transformation for CTZ is implemented as well. >>>>>> So for code: >>>>>> int foo (int i) >>>>>> { >>>>>> return (i == 0) ? 32 : __builtin_clz (i); >>>>>> } >>>>>> >>>>>> On aarch64 we now emit: >>>>>> foo: >>>>>> clz w0, w0 >>>>>> ret >>>>>> >>>>>> instead of: >>>>>> foo: >>>>>> mov w1, 32 >>>>>> clz w2, w0 >>>>>> cmp w0, 0 >>>>>> csel w0, w2, w1, ne >>>>>> ret >>>>>> >>>>>> and for arm similarly we generate: >>>>>> foo: >>>>>> clz r0, r0 >>>>>> bx lr >>>>>> >>>>>> instead of: >>>>>> foo: >>>>>> cmp r0, #0 >>>>>> clzne r0, r0 >>>>>> moveq r0, #32 >>>>>> bx lr >>>>>> >>>>>> >>>>>> and for x86_64 with -O2 -mlzcnt we generate: >>>>>> foo: >>>>>> xorl %eax, %eax >>>>>> lzcntl %edi, %eax >>>>>> ret >>>>>> >>>>>> instead of: >>>>>> foo: >>>>>> xorl %eax, %eax >>>>>> movl $32, %edx >>>>>> lzcntl %edi, %eax >>>>>> testl %edi, %edi >>>>>> cmove %edx, %eax >>>>>> ret >>>>>> >>>>>> >>>>>> I tried getting this to work on other targets as well, but encountered >>>>>> difficulties. >>>>>> For example on powerpc the two arms of the condition seen during ifcvt >>>>>> are: >>>>>> >>>>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ]) >>>>>> (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64} >>>>>> (nil)) >>>>>> and >>>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0) >>>>>> (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 >>>>>> {clzsi2} >>>>>> (expr_list:REG_DEAD (reg/v:DI 157 [ i ]) >>>>>> (nil))) >>>>>> >>>>>> So the setup code in noce_process_if_block sees that the set >>>>>> destination >>>>>> is >>>>>> not the same >>>>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) >>>>>> 0)) >>>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) >>>>>> check. >>>>>> I suppose that's a consequence of how SImode operations are represented >>>>>> in >>>>>> early RTL >>>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt >>>>>> can >>>>>> be >>>>>> taught to handle >>>>>> destinations that are subregs of one another, but that would be a >>>>>> separate >>>>>> patch. >>>>>> >>>>>> Anyway, is this patch ok for trunk? >>>>>> >>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf, >>>>>> aarch64-none-linux-gnu, >>>>>> x86_64-pc-linux-gnu. >>>>>> >>>>>> Thanks, >>>>>> Kyrill >>>>>> >>>>>> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>>>>> >>>>>> PR middle-end/37780 >>>>>> * ifcvt.c (noce_try_ifelse_collapse): New function. >>>>>> Declare prototype. >>>>>> (noce_process_if_block): Call noce_try_ifelse_collapse. >>>>>> * simplify-rtx.c (simplify_cond_clz_ctz): New function. >>>>>> (simplify_ternary_operation): Use the above to simplify >>>>>> conditional CLZ/CTZ expressions. >>>>>> >>>>>> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>>>>> >>>>>> PR middle-end/37780 >>>>>> * gcc.c-torture/execute/pr37780.c: New test. >>>>>> * gcc.target/aarch64/pr37780_1.c: Likewise. >>>>>> * gcc.target/arm/pr37780_1.c: Likewise. >>>>> Hi Kyrylo, >>>> >>>> Hi Christophe, >>>> >>>>> I've noticed that gcc.target/arm/pr37780_1.c fails on >>>>> arm if arch < v6. >>>>> I first tried to fix the effective-target guard (IIRC, the doc >>>>> says clz is available starting with v5t), but that isn't sufficient. >>>>> >>>>> When compiling for armv5-t, the scan-assembler directives >>>>> fail. It seems to work with v6t2, so I am wondering whether >>>>> it's just a matter of increasing the effective-target arch version, >>>>> or if you really intended to make the test pass on these old >>>>> architectures? >>>> >>>> I've dug into it a bit. >>>> I think the problem is that CLZ is available with ARMv5T but only in arm >>>> mode. >>>> In Thumb mode it's only available with ARMv6T2. >>>> So if your armv5t gcc compiles for Thumb by default you'll get the >>>> failure. >>> Actually this gcc is configured with default cpu & mode, >>> target=arm-none-eabi. >>> >>> So I think it's arm7tdmi, arm mode. >>> >>>> I think just bumping the effective to armv6t2 here is appropriate as I >>>> intended >>>> the test to just check the ifcvt transformation when the instruction is >>>> available >>>> rather than test that the CLZ instruction is available at one >>>> architecture >>>> level >>>> or another. >>>> >>>> A patch to bump the effective target check is pre-approved if you want to >>>> do >>>> it. >>>> Otherwise I'll get to it in a few days. >>>> >>> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but >>> this would not be sufficient without adding >>> dg-add-options arm_arch_v6t2 >>> but then if gcc is configured with a higher default architecture, we'll >>> end >>> up testing if works for v6t2 only. >> >> I'm okay with adding "dg-add-options arm_arch_v6t2". >> We're not testing the availability of the instruction here, just the ifcvt >> transformation when it *is* available, so I think we should just add >> whatever >> option guarantees that instruction is available. >> >> Kyrill >> > OK, I've committed the attached patch. > > 2016-06-10 Christophe Lyon <christophe.lyon@linaro.org> > > * gcc.target/arm/pr37780_1.c: Use arm_arch_v6t2 effective target > and options. > Thanks for taking care of this. Kyrill >>>> Thanks for spotting this, >>>> Kyrill >>>>> Thanks, >>>>> >>>>> Christophe. >>>>
diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c index b954fe5..8e06920 100644 --- a/gcc/testsuite/gcc.target/arm/pr37780_1.c +++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c @@ -2,8 +2,9 @@ being defined at zero. */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_arch_v5_ok } */ +/* { dg-require-effective-target arm_arch_v6t2_ok } */ /* { dg-options "-O2" } */ +/* { dg-add-options arm_arch_v6t2 } */ int fooctz (int i)