diff mbox

[ARM,4/4] Adjust gcc.target/arm/wmul-[123].c tests

Message ID 56A1FBEE.5020905@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Jan. 22, 2016, 9:52 a.m. UTC
Hi all,

In this final patch I adjust the troublesome gcc.target/arm/wmul-[123].c tests
to make them more helpful.
gcc.target/arm/wmul-[12].c may now generate either sign-extending multiplies
(+accumulate) or normal 32-bit multiplies since the arguments to the multiplies
are already sign-extended by preceding loads.
So for these tests the patch adds an -mtune option where we know the sign-extending
form to be beneficial. This is, of course, reflected in the rtx costs that guide the
RTL optimisers (after the fixes in patches 2 and 3).

For wmul-3.c we now generate objectively better code.
For the loop we previously generated:
.L2:
     ldrh    r1, [lr, #2]!
     ldrh    ip, [r0, #2]!
     smulbb    ip, r1, ip
     sub    r4, r4, ip
     smulbb    r1, r1, r1
     sub    r2, r2, r1
     cmp    lr, r5
     bne    .L2

and now we generate:
.L2:
     ldrsh    r1, [ip, #2]!
     ldrsh    r4, [r0, #2]!
     mls    lr, r1, r4, lr
     mls    r2, r1, r1, r2
     cmp    ip, r5
     bne    .L2

AFAICT the new sequence is better than the old one even for -mtune=cortex-a9 since it
contains two fewer instructions.

So this test is no longer a good source of getting smulbb instructions.
The proposed change in this patch is to greatly simplify it by writing a simple enough
one-liner that we can always expect to be compiled into a single smulbb instruction.

Tested on arm-none-eabi.
Ok for trunk?

Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/arm/wmul-3.c: Simplify test to generate just
     a single smulbb instruction.
     * gcc.target/amr/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
     * gcc.target/amr/wmul-2.c: Likewise.

Comments

Bernd Schmidt Jan. 22, 2016, 2:53 p.m. UTC | #1
On 01/22/2016 10:52 AM, Kyrill Tkachov wrote:

> AFAICT the new sequence is better than the old one even for
> -mtune=cortex-a9 since it contains two fewer instructions.

Just curious (I think this patch series is good but will leave it to the 
arm folks) - are these instructions equally expensive? Some CPUs are 
faster when doing widening multiplies on smaller objects.


Bernd
Kyrill Tkachov Jan. 22, 2016, 2:59 p.m. UTC | #2
Hi Bernd,

On 22/01/16 14:53, Bernd Schmidt wrote:
> On 01/22/2016 10:52 AM, Kyrill Tkachov wrote:
>
>> AFAICT the new sequence is better than the old one even for
>> -mtune=cortex-a9 since it contains two fewer instructions.
>
> Just curious (I think this patch series is good but will leave it to the arm folks) - are these instructions equally expensive? Some CPUs are faster when doing widening multiplies on smaller objects.
>

The widening multiplies are indeed faster on some targets (which is why we want to keep them in the wmul-[12].c tests).
But for wmul-3.c the new sequence uses fewer instructions. So, while the resulting sequences should be
of similar performance overall, the new sequence has a smaller code size.

Kyrill

>
> Bernd
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c
index ddddd509fe645ea98877753773e7bcf9b6787897..c340f960fa444642fe18ae3bcac93d78fe9dc851 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-1.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-1.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c
index 2ea55f9fbe12f74f38754cb72be791fd6e9495f4..bd2435c9113a82d2e102b545b3141cbda9ba326d 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-2.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-2.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 void vec_mpy(int y[], const short x[], short scaler)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c
index 144b553082e6158701639f05929987de01e7125a..87eba740142a80a1dc1979b4e79d9272a839e7b2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-3.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-3.c
@@ -1,19 +1,11 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O" } */
 
-int mac(const short *a, const short *b, int sqr, int *sum)
+int
+foo (int a, int b)
 {
-  int i;
-  int dotp = *sum;
-
-  for (i = 0; i < 150; i++) {
-    dotp -= b[i] * a[i];
-    sqr -= b[i] * b[i];
-  }
-
-  *sum = dotp;
-  return sqr;
+  return (short) a * (short) b;
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 2 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 } } */