Message ID | 56A1FBEE.5020905@foss.arm.com |
---|---|
State | New |
Headers | show |
On 01/22/2016 10:52 AM, Kyrill Tkachov wrote: > AFAICT the new sequence is better than the old one even for > -mtune=cortex-a9 since it contains two fewer instructions. Just curious (I think this patch series is good but will leave it to the arm folks) - are these instructions equally expensive? Some CPUs are faster when doing widening multiplies on smaller objects. Bernd
Hi Bernd, On 22/01/16 14:53, Bernd Schmidt wrote: > On 01/22/2016 10:52 AM, Kyrill Tkachov wrote: > >> AFAICT the new sequence is better than the old one even for >> -mtune=cortex-a9 since it contains two fewer instructions. > > Just curious (I think this patch series is good but will leave it to the arm folks) - are these instructions equally expensive? Some CPUs are faster when doing widening multiplies on smaller objects. > The widening multiplies are indeed faster on some targets (which is why we want to keep them in the wmul-[12].c tests). But for wmul-3.c the new sequence uses fewer instructions. So, while the resulting sequences should be of similar performance overall, the new sequence has a smaller code size. Kyrill > > Bernd
diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c index ddddd509fe645ea98877753773e7bcf9b6787897..c340f960fa444642fe18ae3bcac93d78fe9dc851 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-1.c +++ b/gcc/testsuite/gcc.target/arm/wmul-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options "-O1 -fexpensive-optimizations" } */ +/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */ int mac(const short *a, const short *b, int sqr, int *sum) { diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c index 2ea55f9fbe12f74f38754cb72be791fd6e9495f4..bd2435c9113a82d2e102b545b3141cbda9ba326d 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-2.c +++ b/gcc/testsuite/gcc.target/arm/wmul-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options "-O1 -fexpensive-optimizations" } */ +/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */ void vec_mpy(int y[], const short x[], short scaler) { diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c index 144b553082e6158701639f05929987de01e7125a..87eba740142a80a1dc1979b4e79d9272a839e7b2 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-3.c +++ b/gcc/testsuite/gcc.target/arm/wmul-3.c @@ -1,19 +1,11 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options "-O1 -fexpensive-optimizations" } */ +/* { dg-options "-O" } */ -int mac(const short *a, const short *b, int sqr, int *sum) +int +foo (int a, int b) { - int i; - int dotp = *sum; - - for (i = 0; i < 150; i++) { - dotp -= b[i] * a[i]; - sqr -= b[i] * b[i]; - } - - *sum = dotp; - return sqr; + return (short) a * (short) b; } -/* { dg-final { scan-assembler-times "smulbb" 2 } } */ +/* { dg-final { scan-assembler-times "smulbb" 1 } } */