diff mbox series

[ARM] Dot Product NEON intrinsics [Patch (3/8)]

Message ID 20171106165336.GA12409@arm.com
State New
Headers show
Series [ARM] Dot Product NEON intrinsics [Patch (3/8)] | expand

Commit Message

Tamar Christina Nov. 6, 2017, 4:53 p.m. UTC
Hi All,

This patch adds the NEON intrinsics for Dot product.

Dot product is available from ARMv8.2-a and onwards.

Regtested on arm-none-eabi, armeb-none-eabi,
aarch64-none-elf and aarch64_be-none-elf with no issues found.

Ok for trunk?

gcc/
2017-11-06  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)
	(vdot_s32, vdotq_s32): New.
	(vdot_lane_u32, vdotq_lane_u32): New.
	(vdot_lane_s32, vdotq_lane_s32): New.


gcc/testsuite/
2017-11-06  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/arm/simd/vdot-compile.c: New.
	* gcc.target/arm/simd/vect-dot-qi.h: New.
	* gcc.target/arm/simd/vect-dot-s8.c: New.
	* gcc.target/arm/simd/vect-dot-u8.c: New

--

Comments

Tamar Christina Nov. 21, 2017, 5:28 p.m. UTC | #1
Ping

> -----Original Message-----

> From: Tamar Christina [mailto:tamar.christina@arm.com]

> Sent: Monday, November 6, 2017 16:54

> To: gcc-patches@gcc.gnu.org

> Cc: nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> <Kyrylo.Tkachov@arm.com>

> Subject: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

> 

> Hi All,

> 

> This patch adds the NEON intrinsics for Dot product.

> 

> Dot product is available from ARMv8.2-a and onwards.

> 

> Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and

> aarch64_be-none-elf with no issues found.

> 

> Ok for trunk?

> 

> gcc/

> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>

> 

> 	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)

> 	(vdot_s32, vdotq_s32): New.

> 	(vdot_lane_u32, vdotq_lane_u32): New.

> 	(vdot_lane_s32, vdotq_lane_s32): New.

> 

> 

> gcc/testsuite/

> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>

> 

> 	* gcc.target/arm/simd/vdot-compile.c: New.

> 	* gcc.target/arm/simd/vect-dot-qi.h: New.

> 	* gcc.target/arm/simd/vect-dot-s8.c: New.

> 	* gcc.target/arm/simd/vect-dot-u8.c: New

> 

> --
Kyrill Tkachov Nov. 22, 2017, 11:26 a.m. UTC | #2
Hi Tamar,

On 06/11/17 16:53, Tamar Christina wrote:
> Hi All,
>
> This patch adds the NEON intrinsics for Dot product.
>
> Dot product is available from ARMv8.2-a and onwards.
>
> Regtested on arm-none-eabi, armeb-none-eabi,
> aarch64-none-elf and aarch64_be-none-elf with no issues found.
>
> Ok for trunk?
>
> gcc/
> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>
>         * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)

This should be config/arm/arm_neon.h

>         (vdot_s32, vdotq_s32): New.
>         (vdot_lane_u32, vdotq_lane_u32): New.
>         (vdot_lane_s32, vdotq_lane_s32): New.
>
>
> gcc/testsuite/
> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>
>         * gcc.target/arm/simd/vdot-compile.c: New.
>         * gcc.target/arm/simd/vect-dot-qi.h: New.
>         * gcc.target/arm/simd/vect-dot-s8.c: New.
>         * gcc.target/arm/simd/vect-dot-u8.c: New
>
> -- 

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
  
  #endif
  
+/* Adv.SIMD Dot Product intrinsics.  */

Please no full stop: "AdvSIMD".

+
+#pragma GCC push_options
+#if __ARM_ARCH >= 8
+#pragma GCC target ("arch=armv8.2-a+dotprod")

<snip>

diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
new file mode 100644
index 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
@@ -0,0 +1,15 @@
+TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+
+__attribute__ ((noinline)) int
+foo1(int len) {
+  int i;
+  TYPE int result = 0;
+  TYPE short prod;
+
+  for (i=0; i<len; i++) {
+    prod = X[i] * Y[i];
+    result += prod;
+  }
+  return result;
+}
\ No newline at end of file

Please add new lines at the end of the new test files.
This applies to a few more new files in this patch.

Ok with these nits fixed.

Thanks,
Kyrill
Christophe Lyon Nov. 23, 2017, 11:26 p.m. UTC | #3
On 22 November 2017 at 12:26, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Tamar,
>
> On 06/11/17 16:53, Tamar Christina wrote:
>>
>> Hi All,
>>
>> This patch adds the NEON intrinsics for Dot product.
>>
>> Dot product is available from ARMv8.2-a and onwards.
>>
>> Regtested on arm-none-eabi, armeb-none-eabi,
>> aarch64-none-elf and aarch64_be-none-elf with no issues found.
>>
>> Ok for trunk?
>>
>> gcc/
>> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>>
>>         * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)
>
>
> This should be config/arm/arm_neon.h
>
>>         (vdot_s32, vdotq_s32): New.
>>         (vdot_lane_u32, vdotq_lane_u32): New.
>>         (vdot_lane_s32, vdotq_lane_s32): New.
>>
>>
>> gcc/testsuite/
>> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>>
>>         * gcc.target/arm/simd/vdot-compile.c: New.
>>         * gcc.target/arm/simd/vect-dot-qi.h: New.
>>         * gcc.target/arm/simd/vect-dot-s8.c: New.
>>         * gcc.target/arm/simd/vect-dot-u8.c: New
>>
>> --
>
>
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> index
> 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a
> 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
>   #endif
>  +/* Adv.SIMD Dot Product intrinsics.  */
>
> Please no full stop: "AdvSIMD".
>
> +
> +#pragma GCC push_options
> +#if __ARM_ARCH >= 8
> +#pragma GCC target ("arch=armv8.2-a+dotprod")
>
> <snip>
>
Not sure if Kyrill actually meant to comment about the three lines
above, but they have a bug:
#if should be before #pragma GCC push_options.

Indeed, after this patch was committed (r255064), I've noticed many
regressions, for instance
p64_p128 is now unsupported. This is because the arm_crypto_ok
effective target now fails
with this message:
XXX/arm_neon.h:16911:1: error: inlining failed in call to
always_inline 'vaeseq_u8': target specific option mismatch

Not sure why this wasn't noticed in validations earlier?

Fixed as obvious (r255126).

Christophe

> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
> @@ -0,0 +1,15 @@
> +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +
> +__attribute__ ((noinline)) int
> +foo1(int len) {
> +  int i;
> +  TYPE int result = 0;
> +  TYPE short prod;
> +
> +  for (i=0; i<len; i++) {
> +    prod = X[i] * Y[i];
> +    result += prod;
> +  }
> +  return result;
> +}
> \ No newline at end of file
>
> Please add new lines at the end of the new test files.
> This applies to a few more new files in this patch.
>
> Ok with these nits fixed.
>
> Thanks,
> Kyrill
>
2017-11-24  Christophe Lyon  <christophe.lyon@linaro.org>

	* config/arm/arm_neon.h: Fix pragma GCC push_options before
	vdot_u32.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c9a8d9..d2e936c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18036,8 +18036,8 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
 
 /* AdvSIMD Dot Product intrinsics.  */
 
-#pragma GCC push_options
 #if __ARM_ARCH >= 8
+#pragma GCC push_options
 #pragma GCC target ("arch=armv8.2-a+dotprod")
 
 __extension__ extern __inline uint32x2_t
Kyrill Tkachov Nov. 24, 2017, 9:33 a.m. UTC | #4
Hi Christophe,

On 23/11/17 23:26, Christophe Lyon wrote:
> On 22 November 2017 at 12:26, Kyrill  Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi Tamar,
>>
>> On 06/11/17 16:53, Tamar Christina wrote:
>>> Hi All,
>>>
>>> This patch adds the NEON intrinsics for Dot product.
>>>
>>> Dot product is available from ARMv8.2-a and onwards.
>>>
>>> Regtested on arm-none-eabi, armeb-none-eabi,
>>> aarch64-none-elf and aarch64_be-none-elf with no issues found.
>>>
>>> Ok for trunk?
>>>
>>> gcc/
>>> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>>>
>>>          * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)
>>
>> This should be config/arm/arm_neon.h
>>
>>>          (vdot_s32, vdotq_s32): New.
>>>          (vdot_lane_u32, vdotq_lane_u32): New.
>>>          (vdot_lane_s32, vdotq_lane_s32): New.
>>>
>>>
>>> gcc/testsuite/
>>> 2017-11-06  Tamar Christina  <tamar.christina@arm.com>
>>>
>>>          * gcc.target/arm/simd/vdot-compile.c: New.
>>>          * gcc.target/arm/simd/vect-dot-qi.h: New.
>>>          * gcc.target/arm/simd/vect-dot-s8.c: New.
>>>          * gcc.target/arm/simd/vect-dot-u8.c: New
>>>
>>> --
>>
>> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
>> index
>> 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a
>> 100644
>> --- a/gcc/config/arm/arm_neon.h
>> +++ b/gcc/config/arm/arm_neon.h
>> @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b)
>>    #endif
>>   +/* Adv.SIMD Dot Product intrinsics.  */
>>
>> Please no full stop: "AdvSIMD".
>>
>> +
>> +#pragma GCC push_options
>> +#if __ARM_ARCH >= 8
>> +#pragma GCC target ("arch=armv8.2-a+dotprod")
>>
>> <snip>
>>
> Not sure if Kyrill actually meant to comment about the three lines
> above, but they have a bug:
> #if should be before #pragma GCC push_options.

You're right, sorry for missing this :(


> Indeed, after this patch was committed (r255064), I've noticed many
> regressions, for instance
> p64_p128 is now unsupported. This is because the arm_crypto_ok
> effective target now fails
> with this message:
> XXX/arm_neon.h:16911:1: error: inlining failed in call to
> always_inline 'vaeseq_u8': target specific option mismatch
>
> Not sure why this wasn't noticed in validations earlier?
>
> Fixed as obvious (r255126).

Thank you for fixing this up.

Kyrill
> Christophe
>
>> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> new file mode 100644
>> index
>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> @@ -0,0 +1,15 @@
>> +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> +
>> +__attribute__ ((noinline)) int
>> +foo1(int len) {
>> +  int i;
>> +  TYPE int result = 0;
>> +  TYPE short prod;
>> +
>> +  for (i=0; i<len; i++) {
>> +    prod = X[i] * Y[i];
>> +    result += prod;
>> +  }
>> +  return result;
>> +}
>> \ No newline at end of file
>>
>> Please add new lines at the end of the new test files.
>> This applies to a few more new files in this patch.
>>
>> Ok with these nits fixed.
>>
>> Thanks,
>> Kyrill
>>
Tamar Christina Nov. 24, 2017, 10:31 a.m. UTC | #5
> >

> Not sure if Kyrill actually meant to comment about the three lines above, but

> they have a bug:

> #if should be before #pragma GCC push_options.

> 

> Indeed, after this patch was committed (r255064), I've noticed many

> regressions, for instance

> p64_p128 is now unsupported. This is because the arm_crypto_ok effective

> target now fails with this message:

> XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline

> 'vaeseq_u8': target specific option mismatch

> 

> Not sure why this wasn't noticed in validations earlier?


I still have the log files for these runs:

It seems that I was comparing the log files instead of the sum files, which do not show this difference.

/d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.log ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log  | grep p64_p128
/d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.sum ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum  | grep p64_p128
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0 
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  execution test
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  (test for excess errors)
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1 
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  execution test
PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  (test for excess errors)
NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O2

Sorry for missing this, I don't even know why these scripts accept the log files if they're always going to do the wrong thing.

Anyway thanks for fixing this and I'll make sure I'm using the sum files in the future.

Tamar

> 

> Fixed as obvious (r255126).

> 

> Christophe

> 

> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> > new file mode 100644

> > index

> >

> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176

> 73

> > dc191cc71169

> > --- /dev/null

> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> > @@ -0,0 +1,15 @@

> > +TYPE char X[N] __attribute__

> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> > +TYPE char Y[N] __attribute__

> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> > +

> > +__attribute__ ((noinline)) int

> > +foo1(int len) {

> > +  int i;

> > +  TYPE int result = 0;

> > +  TYPE short prod;

> > +

> > +  for (i=0; i<len; i++) {

> > +    prod = X[i] * Y[i];

> > +    result += prod;

> > +  }

> > +  return result;

> > +}

> > \ No newline at end of file

> >

> > Please add new lines at the end of the new test files.

> > This applies to a few more new files in this patch.

> >

> > Ok with these nits fixed.

> >

> > Thanks,

> > Kyrill

> >
Christophe Lyon Nov. 24, 2017, 4:15 p.m. UTC | #6
On 24 November 2017 at 11:31, Tamar Christina <Tamar.Christina@arm.com> wrote:
>> >
>> Not sure if Kyrill actually meant to comment about the three lines above, but
>> they have a bug:
>> #if should be before #pragma GCC push_options.
>>
>> Indeed, after this patch was committed (r255064), I've noticed many
>> regressions, for instance
>> p64_p128 is now unsupported. This is because the arm_crypto_ok effective
>> target now fails with this message:
>> XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline
>> 'vaeseq_u8': target specific option mismatch
>>
>> Not sure why this wasn't noticed in validations earlier?
>
> I still have the log files for these runs:
>
> It seems that I was comparing the log files instead of the sum files, which do not show this difference.
>
> /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.log ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log  | grep p64_p128
> /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.sum ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum  | grep p64_p128
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  execution test
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O0  (test for excess errors)
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  execution test
> PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O1  (test for excess errors)
> NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c   -O2
>
> Sorry for missing this, I don't even know why these scripts accept the log files if they're always going to do the wrong thing.
>
> Anyway thanks for fixing this and I'll make sure I'm using the sum files in the future.
>

Thanks for checking why you missed it.

That being said, I think there are a few more problems with your
patch, but there is a lot of "noise" in the reports.

After your commit, I have these reports:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html

After my commit, I have these reports:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255126/report-build-info.html

I haven't fully checked that my patch fixes all the regressions
reported at r255064, but I don't see
why my patch would introduce regressions.... So I think your patch is
causing problems:
* on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
    gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
    gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
    gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]

* on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-d16-fp16
(the 2 "BIG-REGR" entries)

where a few tests fail:
(arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
    gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/pr65947-14.c execution test

(armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
  Executed from: gcc.dg/vect/vect.exp
    gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/pr51074.c execution test
    gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/pr65947-14.c execution test
    gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/vect-nb-iter-ub-2.c execution test
    gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/vect-nb-iter-ub-3.c execution test
    gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/vect-strided-shift-1.c execution test
    gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
    gcc.dg/vect/vect-strided-u16-i3.c execution test
  Executed from: gcc.target/arm/arm.exp
    gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
    gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
    gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
    gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
    gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
    gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect
"vectorized 1 loops" 1 (found 0 times)

I haven't checked whether this tests were already failing before your
patch, and are just reported as new failures because they failed to
compile in the mean time.

Not sure I am clear :-)

Sorry for the delay and potentially hard to parse reports, I'm
struggling with infrastructure problems.

Thanks,

Christophe

> Tamar
>
>>
>> Fixed as obvious (r255126).
>>
>> Christophe
>>
>> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> > new file mode 100644
>> > index
>> >
>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176
>> 73
>> > dc191cc71169
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> > @@ -0,0 +1,15 @@
>> > +TYPE char X[N] __attribute__
>> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> > +TYPE char Y[N] __attribute__
>> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> > +
>> > +__attribute__ ((noinline)) int
>> > +foo1(int len) {
>> > +  int i;
>> > +  TYPE int result = 0;
>> > +  TYPE short prod;
>> > +
>> > +  for (i=0; i<len; i++) {
>> > +    prod = X[i] * Y[i];
>> > +    result += prod;
>> > +  }
>> > +  return result;
>> > +}
>> > \ No newline at end of file
>> >
>> > Please add new lines at the end of the new test files.
>> > This applies to a few more new files in this patch.
>> >
>> > Ok with these nits fixed.
>> >
>> > Thanks,
>> > Kyrill
>> >
Tamar Christina Nov. 24, 2017, 6:05 p.m. UTC | #7
Hi Christophe,

> 

> After your commit, I have these reports:

> http://people.linaro.org/~christophe.lyon/cross-

> validation/gcc/trunk/255064/report-build-info.html

> 

> After my commit, I have these reports:

> http://people.linaro.org/~christophe.lyon/cross-

> validation/gcc/trunk/255126/report-build-info.html

> 

> I haven't fully checked that my patch fixes all the regressions reported at

> r255064, but I don't see why my patch would introduce regressions.... So I

> think your patch is causing problems:

> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):

>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)

>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]

>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]

> 

> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-

> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-

> d16-fp16 (the 2 "BIG-REGR" entries)


This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header.

gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.

I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊
However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather.

Thanks,
Tamar

> where a few tests fail:

> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):

>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/pr65947-14.c execution test

> 

> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):

>   Executed from: gcc.dg/vect/vect.exp

>     gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/pr51074.c execution test

>     gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/pr65947-14.c execution test

>     gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/vect-nb-iter-ub-2.c execution test

>     gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/vect-nb-iter-ub-3.c execution test

>     gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/vect-strided-shift-1.c execution test

>     gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test

>     gcc.dg/vect/vect-strided-u16-i3.c execution test

>   Executed from: gcc.target/arm/arm.exp

>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)

>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]

>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]

>     gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32

>     gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32

>     gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1

> loops" 1 (found 0 times)

> 

> I haven't checked whether this tests were already failing before your patch,

> and are just reported as new failures because they failed to compile in the

> mean time.

> 

> Not sure I am clear :-)

> 

> Sorry for the delay and potentially hard to parse reports, I'm struggling with

> infrastructure problems.

> 

> Thanks,

> 

> Christophe

> 

> > Tamar

> >

> >>

> >> Fixed as obvious (r255126).

> >>

> >> Christophe

> >>

> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >> > new file mode 100644

> >> > index

> >> >

> >>

> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176

> >> 73

> >> > dc191cc71169

> >> > --- /dev/null

> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >> > @@ -0,0 +1,15 @@

> >> > +TYPE char X[N] __attribute__

> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> >> > +TYPE char Y[N] __attribute__

> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> >> > +

> >> > +__attribute__ ((noinline)) int

> >> > +foo1(int len) {

> >> > +  int i;

> >> > +  TYPE int result = 0;

> >> > +  TYPE short prod;

> >> > +

> >> > +  for (i=0; i<len; i++) {

> >> > +    prod = X[i] * Y[i];

> >> > +    result += prod;

> >> > +  }

> >> > +  return result;

> >> > +}

> >> > \ No newline at end of file

> >> >

> >> > Please add new lines at the end of the new test files.

> >> > This applies to a few more new files in this patch.

> >> >

> >> > Ok with these nits fixed.

> >> >

> >> > Thanks,

> >> > Kyrill

> >> >
Christophe Lyon Nov. 24, 2017, 7:38 p.m. UTC | #8
On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote:
> Hi Christophe,
>
>>
>> After your commit, I have these reports:
>> http://people.linaro.org/~christophe.lyon/cross-
>> validation/gcc/trunk/255064/report-build-info.html
>>
>> After my commit, I have these reports:
>> http://people.linaro.org/~christophe.lyon/cross-
>> validation/gcc/trunk/255126/report-build-info.html
>>
>> I haven't fully checked that my patch fixes all the regressions reported at
>> r255064, but I don't see why my patch would introduce regressions.... So I
>> think your patch is causing problems:
>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>
>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
>> d16-fp16 (the 2 "BIG-REGR" entries)
>
> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header.
>
> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.
>
> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊
> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather.

Agreed. But note that many regressions are reported for the
configurations --with-fpu vfpv3-d16-fp16
at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html
Maybe that's just a matter of arm_neon.h being included by some
effective-target tests?


>
> Thanks,
> Tamar
>
>> where a few tests fail:
>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/pr65947-14.c execution test
>>
>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>>   Executed from: gcc.dg/vect/vect.exp
>>     gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/pr51074.c execution test
>>     gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/pr65947-14.c execution test
>>     gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/vect-nb-iter-ub-2.c execution test
>>     gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/vect-nb-iter-ub-3.c execution test
>>     gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/vect-strided-shift-1.c execution test
>>     gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
>>     gcc.dg/vect/vect-strided-u16-i3.c execution test
>>   Executed from: gcc.target/arm/arm.exp
>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>     gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
>>     gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
>>     gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
>> loops" 1 (found 0 times)
>>
>> I haven't checked whether this tests were already failing before your patch,
>> and are just reported as new failures because they failed to compile in the
>> mean time.
>>
>> Not sure I am clear :-)
>>
>> Sorry for the delay and potentially hard to parse reports, I'm struggling with
>> infrastructure problems.
>>
>> Thanks,
>>
>> Christophe
>>
>> > Tamar
>> >
>> >>
>> >> Fixed as obvious (r255126).
>> >>
>> >> Christophe
>> >>
>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > new file mode 100644
>> >> > index
>> >> >
>> >>
>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176
>> >> 73
>> >> > dc191cc71169
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>> >> > @@ -0,0 +1,15 @@
>> >> > +TYPE char X[N] __attribute__
>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> >> > +TYPE char Y[N] __attribute__
>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>> >> > +
>> >> > +__attribute__ ((noinline)) int
>> >> > +foo1(int len) {
>> >> > +  int i;
>> >> > +  TYPE int result = 0;
>> >> > +  TYPE short prod;
>> >> > +
>> >> > +  for (i=0; i<len; i++) {
>> >> > +    prod = X[i] * Y[i];
>> >> > +    result += prod;
>> >> > +  }
>> >> > +  return result;
>> >> > +}
>> >> > \ No newline at end of file
>> >> >
>> >> > Please add new lines at the end of the new test files.
>> >> > This applies to a few more new files in this patch.
>> >> >
>> >> > Ok with these nits fixed.
>> >> >
>> >> > Thanks,
>> >> > Kyrill
>> >> >
Christophe Lyon Nov. 26, 2017, 12:56 p.m. UTC | #9
On 24 November 2017 at 20:38, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote:
>> Hi Christophe,
>>
>>>
>>> After your commit, I have these reports:
>>> http://people.linaro.org/~christophe.lyon/cross-
>>> validation/gcc/trunk/255064/report-build-info.html
>>>
>>> After my commit, I have these reports:
>>> http://people.linaro.org/~christophe.lyon/cross-
>>> validation/gcc/trunk/255126/report-build-info.html
>>>
>>> I haven't fully checked that my patch fixes all the regressions reported at
>>> r255064, but I don't see why my patch would introduce regressions.... So I
>>> think your patch is causing problems:
>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>
>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
>>> d16-fp16 (the 2 "BIG-REGR" entries)
>>
>> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header.
>>
>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.
>>
>> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊
>> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather.
>
> Agreed. But note that many regressions are reported for the
> configurations --with-fpu vfpv3-d16-fp16
> at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html
> Maybe that's just a matter of arm_neon.h being included by some
> effective-target tests?
>
>
Hi Tamar,

Good news, I have confirmed your obvious thoughts: I have run
validations of r255063+your patch fixed, and the results are clean:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html

I have also compared r255063 to r255216 (that is I applied all patches
between yours and mine):
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html
which confirms some regressions have been introduced in-between,
hidden by the problem in your patch.

Some may be obvious to bisect, some less.

Christophe

>>
>> Thanks,
>> Tamar
>>
>>> where a few tests fail:
>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/pr65947-14.c execution test
>>>
>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>>>   Executed from: gcc.dg/vect/vect.exp
>>>     gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/pr51074.c execution test
>>>     gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/pr65947-14.c execution test
>>>     gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/vect-nb-iter-ub-2.c execution test
>>>     gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/vect-nb-iter-ub-3.c execution test
>>>     gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/vect-strided-shift-1.c execution test
>>>     gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
>>>     gcc.dg/vect/vect-strided-u16-i3.c execution test
>>>   Executed from: gcc.target/arm/arm.exp
>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>     gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
>>>     gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
>>>     gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
>>> loops" 1 (found 0 times)
>>>
>>> I haven't checked whether this tests were already failing before your patch,
>>> and are just reported as new failures because they failed to compile in the
>>> mean time.
>>>
>>> Not sure I am clear :-)
>>>
>>> Sorry for the delay and potentially hard to parse reports, I'm struggling with
>>> infrastructure problems.
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>> > Tamar
>>> >
>>> >>
>>> >> Fixed as obvious (r255126).
>>> >>
>>> >> Christophe
>>> >>
>>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>> >> > new file mode 100644
>>> >> > index
>>> >> >
>>> >>
>>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176
>>> >> 73
>>> >> > dc191cc71169
>>> >> > --- /dev/null
>>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>> >> > @@ -0,0 +1,15 @@
>>> >> > +TYPE char X[N] __attribute__
>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> >> > +TYPE char Y[N] __attribute__
>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>> >> > +
>>> >> > +__attribute__ ((noinline)) int
>>> >> > +foo1(int len) {
>>> >> > +  int i;
>>> >> > +  TYPE int result = 0;
>>> >> > +  TYPE short prod;
>>> >> > +
>>> >> > +  for (i=0; i<len; i++) {
>>> >> > +    prod = X[i] * Y[i];
>>> >> > +    result += prod;
>>> >> > +  }
>>> >> > +  return result;
>>> >> > +}
>>> >> > \ No newline at end of file
>>> >> >
>>> >> > Please add new lines at the end of the new test files.
>>> >> > This applies to a few more new files in this patch.
>>> >> >
>>> >> > Ok with these nits fixed.
>>> >> >
>>> >> > Thanks,
>>> >> > Kyrill
>>> >> >
Christophe Lyon Nov. 26, 2017, 8:01 p.m. UTC | #10
On 26 November 2017 at 13:56, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 24 November 2017 at 20:38, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote:
>>> Hi Christophe,
>>>
>>>>
>>>> After your commit, I have these reports:
>>>> http://people.linaro.org/~christophe.lyon/cross-
>>>> validation/gcc/trunk/255064/report-build-info.html
>>>>
>>>> After my commit, I have these reports:
>>>> http://people.linaro.org/~christophe.lyon/cross-
>>>> validation/gcc/trunk/255126/report-build-info.html
>>>>
>>>> I haven't fully checked that my patch fixes all the regressions reported at
>>>> r255064, but I don't see why my patch would introduce regressions.... So I
>>>> think your patch is causing problems:
>>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
>>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>>
>>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
>>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
>>>> d16-fp16 (the 2 "BIG-REGR" entries)
>>>
>>> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header.
>>>
>>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.
>>>
>>> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊
>>> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather.
>>
>> Agreed. But note that many regressions are reported for the
>> configurations --with-fpu vfpv3-d16-fp16
>> at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html
>> Maybe that's just a matter of arm_neon.h being included by some
>> effective-target tests?
>>
>>
> Hi Tamar,
>
> Good news, I have confirmed your obvious thoughts: I have run
> validations of r255063+your patch fixed, and the results are clean:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html
>
> I have also compared r255063 to r255216 (that is I applied all patches
> between yours and mine):
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html
> which confirms some regressions have been introduced in-between,
> hidden by the problem in your patch.
>
> Some may be obvious to bisect, some less.
>
OK, so for gcc:
FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of time"
after r255103, which updated the test

several failures for gcc.target/arm/addr-modes-float.c which was
introduced at r255111 (Charles is aware of that, probably just a
matter of adding the right effective-target)

I'm still trying to reproduce the regression:
FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test
on armeb

and for g++:
g++.dg/ipa/devirt-22.C  -std=gnu++11  scan-ipa-dump-times cp
"Discovered a virtual call to a known target" 1 (found 2 times)
g++.dg/ipa/devirt-22.C  -std=gnu++14  scan-ipa-dump-times cp
"Discovered a virtual call to a known target" 1 (found 2 times)
g++.dg/ipa/devirt-22.C  -std=gnu++98  scan-ipa-dump-times cp
"Discovered a virtual call to a known target" 1 (found 2 times)
g++.dg/pr79095-4.C  -std=gnu++98  scan-tree-dump-times vrp2
"__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times)
g++.dg/pr79095-4.C  -std=gnu++98  (test for warnings, line )

Christophe


> Christophe
>
>>>
>>> Thanks,
>>> Tamar
>>>
>>>> where a few tests fail:
>>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
>>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/pr65947-14.c execution test
>>>>
>>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>>>>   Executed from: gcc.dg/vect/vect.exp
>>>>     gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/pr51074.c execution test
>>>>     gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/pr65947-14.c execution test
>>>>     gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/vect-nb-iter-ub-2.c execution test
>>>>     gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/vect-nb-iter-ub-3.c execution test
>>>>     gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/vect-strided-shift-1.c execution test
>>>>     gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
>>>>     gcc.dg/vect/vect-strided-u16-i3.c execution test
>>>>   Executed from: gcc.target/arm/arm.exp
>>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>>     gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
>>>>     gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
>>>>     gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
>>>> loops" 1 (found 0 times)
>>>>
>>>> I haven't checked whether this tests were already failing before your patch,
>>>> and are just reported as new failures because they failed to compile in the
>>>> mean time.
>>>>
>>>> Not sure I am clear :-)
>>>>
>>>> Sorry for the delay and potentially hard to parse reports, I'm struggling with
>>>> infrastructure problems.
>>>>
>>>> Thanks,
>>>>
>>>> Christophe
>>>>
>>>> > Tamar
>>>> >
>>>> >>
>>>> >> Fixed as obvious (r255126).
>>>> >>
>>>> >> Christophe
>>>> >>
>>>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>> >> > new file mode 100644
>>>> >> > index
>>>> >> >
>>>> >>
>>>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176
>>>> >> 73
>>>> >> > dc191cc71169
>>>> >> > --- /dev/null
>>>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>> >> > @@ -0,0 +1,15 @@
>>>> >> > +TYPE char X[N] __attribute__
>>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>>> >> > +TYPE char Y[N] __attribute__
>>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>>> >> > +
>>>> >> > +__attribute__ ((noinline)) int
>>>> >> > +foo1(int len) {
>>>> >> > +  int i;
>>>> >> > +  TYPE int result = 0;
>>>> >> > +  TYPE short prod;
>>>> >> > +
>>>> >> > +  for (i=0; i<len; i++) {
>>>> >> > +    prod = X[i] * Y[i];
>>>> >> > +    result += prod;
>>>> >> > +  }
>>>> >> > +  return result;
>>>> >> > +}
>>>> >> > \ No newline at end of file
>>>> >> >
>>>> >> > Please add new lines at the end of the new test files.
>>>> >> > This applies to a few more new files in this patch.
>>>> >> >
>>>> >> > Ok with these nits fixed.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Kyrill
>>>> >> >
Kyrill Tkachov Nov. 27, 2017, 10:48 a.m. UTC | #11
Hi Christophe,

On 26/11/17 20:01, Christophe Lyon wrote:
> On 26 November 2017 at 13:56, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 24 November 2017 at 20:38, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>>> On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote:
>>>> Hi Christophe,
>>>>
>>>>> After your commit, I have these reports:
>>>>> http://people.linaro.org/~christophe.lyon/cross-
>>>>> validation/gcc/trunk/255064/report-build-info.html
>>>>>
>>>>> After my commit, I have these reports:
>>>>> http://people.linaro.org/~christophe.lyon/cross-
>>>>> validation/gcc/trunk/255126/report-build-info.html
>>>>>
>>>>> I haven't fully checked that my patch fixes all the regressions reported at
>>>>> r255064, but I don't see why my patch would introduce regressions.... So I
>>>>> think your patch is causing problems:
>>>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):
>>>>>      gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>>>      gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>>>      gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>>>
>>>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-
>>>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-
>>>>> d16-fp16 (the 2 "BIG-REGR" entries)
>>>> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header.
>>>>
>>>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test.
>>>>
>>>> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊
>>>> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather.
>>> Agreed. But note that many regressions are reported for the
>>> configurations --with-fpu vfpv3-d16-fp16
>>> at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html
>>> Maybe that's just a matter of arm_neon.h being included by some
>>> effective-target tests?
>>>
>>>
>> Hi Tamar,
>>
>> Good news, I have confirmed your obvious thoughts: I have run
>> validations of r255063+your patch fixed, and the results are clean:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html
>>
>> I have also compared r255063 to r255216 (that is I applied all patches
>> between yours and mine):
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html
>> which confirms some regressions have been introduced in-between,
>> hidden by the problem in your patch.
>>
>> Some may be obvious to bisect, some less.
>>

thank you very much for tracking these down.

> OK, so for gcc:
> FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of time"
> after r255103, which updated the test

Might be related to the various profile update cleanups that have been 
going on
over the last few weeks.

> several failures for gcc.target/arm/addr-modes-float.c which was
> introduced at r255111 (Charles is aware of that, probably just a
> matter of adding the right effective-target)

I agree.

> I'm still trying to reproduce the regression:
> FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test
> on armeb

Hmm, maybe something to do with the check_vect check that these tests do?
Or model flakiness...

>
> and for g++:
> g++.dg/ipa/devirt-22.C  -std=gnu++11  scan-ipa-dump-times cp
> "Discovered a virtual call to a known target" 1 (found 2 times)
> g++.dg/ipa/devirt-22.C  -std=gnu++14  scan-ipa-dump-times cp
> "Discovered a virtual call to a known target" 1 (found 2 times)
> g++.dg/ipa/devirt-22.C  -std=gnu++98  scan-ipa-dump-times cp
> "Discovered a virtual call to a known target" 1 (found 2 times)
> g++.dg/pr79095-4.C  -std=gnu++98  scan-tree-dump-times vrp2
> "__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times)
> g++.dg/pr79095-4.C  -std=gnu++98  (test for warnings, line )

I'd guess these are related to the profile update improvements as well.

Kyrill

> Christophe
>
>
>> Christophe
>>
>>>> Thanks,
>>>> Tamar
>>>>
>>>>> where a few tests fail:
>>>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):
>>>>>      gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/pr65947-14.c execution test
>>>>>
>>>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):
>>>>>    Executed from: gcc.dg/vect/vect.exp
>>>>>      gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/pr51074.c execution test
>>>>>      gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/pr65947-14.c execution test
>>>>>      gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/vect-nb-iter-ub-2.c execution test
>>>>>      gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/vect-nb-iter-ub-3.c execution test
>>>>>      gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/vect-strided-shift-1.c execution test
>>>>>      gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test
>>>>>      gcc.dg/vect/vect-strided-u16-i3.c execution test
>>>>>    Executed from: gcc.target/arm/arm.exp
>>>>>      gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times)
>>>>>      gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]
>>>>>      gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]
>>>>>      gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32
>>>>>      gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32
>>>>>      gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1
>>>>> loops" 1 (found 0 times)
>>>>>
>>>>> I haven't checked whether this tests were already failing before your patch,
>>>>> and are just reported as new failures because they failed to compile in the
>>>>> mean time.
>>>>>
>>>>> Not sure I am clear :-)
>>>>>
>>>>> Sorry for the delay and potentially hard to parse reports, I'm struggling with
>>>>> infrastructure problems.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Christophe
>>>>>
>>>>>> Tamar
>>>>>>
>>>>>>> Fixed as obvious (r255126).
>>>>>>>
>>>>>>> Christophe
>>>>>>>
>>>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>>>>>> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>>>>>> new file mode 100644
>>>>>>>> index
>>>>>>>>
>>>>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176
>>>>>>> 73
>>>>>>>> dc191cc71169
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
>>>>>>>> @@ -0,0 +1,15 @@
>>>>>>>> +TYPE char X[N] __attribute__
>>>>>>> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>>>>>>> +TYPE char Y[N] __attribute__
>>>>>>> ((__aligned__(__BIGGEST_ALIGNMENT__)));
>>>>>>>> +
>>>>>>>> +__attribute__ ((noinline)) int
>>>>>>>> +foo1(int len) {
>>>>>>>> +  int i;
>>>>>>>> +  TYPE int result = 0;
>>>>>>>> +  TYPE short prod;
>>>>>>>> +
>>>>>>>> +  for (i=0; i<len; i++) {
>>>>>>>> +    prod = X[i] * Y[i];
>>>>>>>> +    result += prod;
>>>>>>>> +  }
>>>>>>>> +  return result;
>>>>>>>> +}
>>>>>>>> \ No newline at end of file
>>>>>>>>
>>>>>>>> Please add new lines at the end of the new test files.
>>>>>>>> This applies to a few more new files in this patch.
>>>>>>>>
>>>>>>>> Ok with these nits fixed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kyrill
>>>>>>>>
Tamar Christina Nov. 27, 2017, 10:49 a.m. UTC | #12
Hi Christoph,

> -----Original Message-----

> From: Christophe Lyon [mailto:christophe.lyon@linaro.org]

> Sent: Sunday, November 26, 2017 20:01

> To: Tamar Christina <Tamar.Christina@arm.com>

> Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; gcc-patches@gcc.gnu.org;

> nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com

> Subject: Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

> 

> On 26 November 2017 at 13:56, Christophe Lyon <christophe.lyon@linaro.org>

> wrote:

> > On 24 November 2017 at 20:38, Christophe Lyon

> > <christophe.lyon@linaro.org> wrote:

> >> On 24 November 2017 at 19:05, Tamar Christina

> <Tamar.Christina@arm.com> wrote:

> >>> Hi Christophe,

> >>>

> >>>>

> >>>> After your commit, I have these reports:

> >>>> http://people.linaro.org/~christophe.lyon/cross-

> >>>> validation/gcc/trunk/255064/report-build-info.html

> >>>>

> >>>> After my commit, I have these reports:

> >>>> http://people.linaro.org/~christophe.lyon/cross-

> >>>> validation/gcc/trunk/255126/report-build-info.html

> >>>>

> >>>> I haven't fully checked that my patch fixes all the regressions

> >>>> reported at r255064, but I don't see why my patch would introduce

> >>>> regressions.... So I think your patch is causing problems:

> >>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries):

> >>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2

> times)

> >>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]

> >>>>     gcc.target/arm/neon-vfms-1.c scan-assembler

> >>>> vfms\\.f32[\t]+[dDqQ]

> >>>>

> >>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu

> >>>> vfpv3-d16-

> >>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu

> >>>> vfpv3-

> >>>> d16-fp16 (the 2 "BIG-REGR" entries)

> >>>

> >>> This patch only introduced a few neon instrinsics in arm_neon.h, and

> most of these files don't use the header.

> >>>

> >>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new

> test.

> >>>

> >>> I will run some regressions over the weekend on an updated tree, but

> >>> I can't understand how a not included header it can cause execution

> >>> failures 😊

> >>> However most of those are vectorizer tests. It seems much more likely

> to me that vectorization is broken rather.

> >>

> >> Agreed. But note that many regressions are reported for the

> >> configurations --with-fpu vfpv3-d16-fp16

> >> at:

> >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/

> >> 255064/report-build-info.html Maybe that's just a matter of

> >> arm_neon.h being included by some effective-target tests?

> >>

> >>

> > Hi Tamar,

> >

> > Good news, I have confirmed your obvious thoughts: I have run

> > validations of r255063+your patch fixed, and the results are clean:

> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-pa

> > tches/255063-r255064-fixed.patch/report-build-info.html


Thanks for confirming! My own finished as well. Sorry again for the breakage, I've
Updated my scripts to exclude log files so this shouldn't happen again!.


> >

> > I have also compared r255063 to r255216 (that is I applied all patches

> > between yours and mine):

> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-pa

> > tches/255063-r255063-255126.patch/report-build-info.html

> > which confirms some regressions have been introduced in-between,

> > hidden by the problem in your patch.

> >

> > Some may be obvious to bisect, some less.

> >

> OK, so for gcc:

> FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of

> time"

> after r255103, which updated the test

> 

> several failures for gcc.target/arm/addr-modes-float.c which was introduced

> at r255111 (Charles is aware of that, probably just a matter of adding the right

> effective-target)

> 

> I'm still trying to reproduce the regression:

> FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test on armeb

> 

> and for g++:

> g++.dg/ipa/devirt-22.C  -std=gnu++11  scan-ipa-dump-times cp

> "Discovered a virtual call to a known target" 1 (found 2 times)

> g++.dg/ipa/devirt-22.C  -std=gnu++14  scan-ipa-dump-times cp

> "Discovered a virtual call to a known target" 1 (found 2 times)

> g++.dg/ipa/devirt-22.C  -std=gnu++98  scan-ipa-dump-times cp

> "Discovered a virtual call to a known target" 1 (found 2 times)

> g++.dg/pr79095-4.C  -std=gnu++98  scan-tree-dump-times vrp2

> "__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times)

> g++.dg/pr79095-4.C  -std=gnu++98  (test for warnings, line )

> 

> Christophe

> 

> 

> > Christophe

> >

> >>>

> >>> Thanks,

> >>> Tamar

> >>>

> >>>> where a few tests fail:

> >>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16):

> >>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/pr65947-14.c execution test

> >>>>

> >>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16):

> >>>>   Executed from: gcc.dg/vect/vect.exp

> >>>>     gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/pr51074.c execution test

> >>>>     gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/pr65947-14.c execution test

> >>>>     gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/vect-nb-iter-ub-2.c execution test

> >>>>     gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/vect-nb-iter-ub-3.c execution test

> >>>>     gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution

> test

> >>>>     gcc.dg/vect/vect-strided-shift-1.c execution test

> >>>>     gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test

> >>>>     gcc.dg/vect/vect-strided-u16-i3.c execution test

> >>>>   Executed from: gcc.target/arm/arm.exp

> >>>>     gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2

> times)

> >>>>     gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ]

> >>>>     gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ]

> >>>>     gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32

> >>>>     gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32

> >>>>     gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect

> >>>> "vectorized 1 loops" 1 (found 0 times)

> >>>>

> >>>> I haven't checked whether this tests were already failing before

> >>>> your patch, and are just reported as new failures because they

> >>>> failed to compile in the mean time.

> >>>>

> >>>> Not sure I am clear :-)

> >>>>

> >>>> Sorry for the delay and potentially hard to parse reports, I'm

> >>>> struggling with infrastructure problems.

> >>>>

> >>>> Thanks,

> >>>>

> >>>> Christophe

> >>>>

> >>>> > Tamar

> >>>> >

> >>>> >>

> >>>> >> Fixed as obvious (r255126).

> >>>> >>

> >>>> >> Christophe

> >>>> >>

> >>>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >>>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >>>> >> > new file mode 100644

> >>>> >> > index

> >>>> >> >

> >>>> >>

> >>>>

> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17

> >>>> 6

> >>>> >> 73

> >>>> >> > dc191cc71169

> >>>> >> > --- /dev/null

> >>>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h

> >>>> >> > @@ -0,0 +1,15 @@

> >>>> >> > +TYPE char X[N] __attribute__

> >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> >>>> >> > +TYPE char Y[N] __attribute__

> >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__)));

> >>>> >> > +

> >>>> >> > +__attribute__ ((noinline)) int foo1(int len) {

> >>>> >> > +  int i;

> >>>> >> > +  TYPE int result = 0;

> >>>> >> > +  TYPE short prod;

> >>>> >> > +

> >>>> >> > +  for (i=0; i<len; i++) {

> >>>> >> > +    prod = X[i] * Y[i];

> >>>> >> > +    result += prod;

> >>>> >> > +  }

> >>>> >> > +  return result;

> >>>> >> > +}

> >>>> >> > \ No newline at end of file

> >>>> >> >

> >>>> >> > Please add new lines at the end of the new test files.

> >>>> >> > This applies to a few more new files in this patch.

> >>>> >> >

> >>>> >> > Ok with these nits fixed.

> >>>> >> >

> >>>> >> > Thanks,

> >>>> >> > Kyrill

> >>>> >> >
diff mbox series

Patch

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18034,6 +18034,72 @@  vzipq_f16 (float16x8_t __a, float16x8_t __b)
 
 #endif
 
+/* Adv.SIMD Dot Product intrinsics.  */
+
+#pragma GCC push_options
+#if __ARM_ARCH >= 8
+#pragma GCC target ("arch=armv8.2-a+dotprod")
+
+__extension__ extern __inline uint32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
+{
+  return __builtin_neon_udotv8qi_uuuu (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_neon_udotv16qi_uuuu (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
+{
+  return __builtin_neon_sdotv8qi (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_sdotv16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_lane_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b, const int __index)
+{
+  return __builtin_neon_udot_lanev8qi_uuuus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_lane_u32 (uint32x4_t __r, uint8x16_t __a, uint8x8_t __b,
+		const int __index)
+{
+  return __builtin_neon_udot_lanev16qi_uuuus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdot_lane_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b, const int __index)
+{
+  return __builtin_neon_sdot_lanev8qi (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vdotq_lane_s32 (int32x4_t __r, int8x16_t __a, int8x8_t __b, const int __index)
+{
+  return __builtin_neon_sdot_lanev16qi (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..a422384b0a0140d4afb4ff4a04223dd20f8d9960
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c
@@ -0,0 +1,55 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include <arm_neon.h>
+
+/* Unsigned Dot Product instructions.  */
+
+uint32x2_t ufoo (uint32x2_t r, uint8x8_t x, uint8x8_t y)
+{
+  return vdot_u32 (r, x, y);
+}
+
+uint32x4_t ufooq (uint32x4_t r, uint8x16_t x, uint8x16_t y)
+{
+  return vdotq_u32 (r, x, y);
+}
+
+uint32x2_t ufoo_lane (uint32x2_t r, uint8x8_t x, uint8x8_t y)
+{
+  return vdot_lane_u32 (r, x, y, 0);
+}
+
+uint32x4_t ufooq_lane (uint32x4_t r, uint8x16_t x, uint8x8_t y)
+{
+  return vdotq_lane_u32 (r, x, y, 0);
+}
+
+/* Signed Dot Product instructions.  */
+
+int32x2_t sfoo (int32x2_t r, int8x8_t x, int8x8_t y)
+{
+  return vdot_s32 (r, x, y);
+}
+
+int32x4_t sfooq (int32x4_t r, int8x16_t x, int8x16_t y)
+{
+  return vdotq_s32 (r, x, y);
+}
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, int8x8_t y)
+{
+  return vdot_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, int8x8_t y)
+{
+  return vdotq_lane_s32 (r, x, y, 0);
+}
+
+/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */
+/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
new file mode 100644
index 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h
@@ -0,0 +1,15 @@ 
+TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+
+__attribute__ ((noinline)) int
+foo1(int len) {
+  int i;
+  TYPE int result = 0;
+  TYPE short prod;
+
+  for (i=0; i<len; i++) {
+    prod = X[i] * Y[i];
+    result += prod;
+  }
+  return result;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c b/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c
new file mode 100644
index 0000000000000000000000000000000000000000..6593404a682f76c8adce6b34de8ec4a2d0d97feb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#define N 64
+#define TYPE signed
+
+#include "vect-dot-qi.h"
+
+/* { dg-final { scan-assembler-times {vsdot\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 4 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c b/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4d191ee827268f267c23427aa51101efbaeff38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#define N 64
+#define TYPE unsigned
+
+#include "vect-dot-qi.h"
+
+/* { dg-final { scan-assembler-times {vudot\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 4 } } */
\ No newline at end of file