Message ID | 20171106165336.GA12409@arm.com |
---|---|
State | New |
Headers | show |
Series | [ARM] Dot Product NEON intrinsics [Patch (3/8)] | expand |
Ping > -----Original Message----- > From: Tamar Christina [mailto:tamar.christina@arm.com] > Sent: Monday, November 6, 2017 16:54 > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Ramana Radhakrishnan > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com> > Subject: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)] > > Hi All, > > This patch adds the NEON intrinsics for Dot product. > > Dot product is available from ARMv8.2-a and onwards. > > Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and > aarch64_be-none-elf with no issues found. > > Ok for trunk? > > gcc/ > 2017-11-06 Tamar Christina <tamar.christina@arm.com> > > * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32) > (vdot_s32, vdotq_s32): New. > (vdot_lane_u32, vdotq_lane_u32): New. > (vdot_lane_s32, vdotq_lane_s32): New. > > > gcc/testsuite/ > 2017-11-06 Tamar Christina <tamar.christina@arm.com> > > * gcc.target/arm/simd/vdot-compile.c: New. > * gcc.target/arm/simd/vect-dot-qi.h: New. > * gcc.target/arm/simd/vect-dot-s8.c: New. > * gcc.target/arm/simd/vect-dot-u8.c: New > > --
Hi Tamar, On 06/11/17 16:53, Tamar Christina wrote: > Hi All, > > This patch adds the NEON intrinsics for Dot product. > > Dot product is available from ARMv8.2-a and onwards. > > Regtested on arm-none-eabi, armeb-none-eabi, > aarch64-none-elf and aarch64_be-none-elf with no issues found. > > Ok for trunk? > > gcc/ > 2017-11-06 Tamar Christina <tamar.christina@arm.com> > > * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32) This should be config/arm/arm_neon.h > (vdot_s32, vdotq_s32): New. > (vdot_lane_u32, vdotq_lane_u32): New. > (vdot_lane_s32, vdotq_lane_s32): New. > > > gcc/testsuite/ > 2017-11-06 Tamar Christina <tamar.christina@arm.com> > > * gcc.target/arm/simd/vdot-compile.c: New. > * gcc.target/arm/simd/vect-dot-qi.h: New. > * gcc.target/arm/simd/vect-dot-s8.c: New. > * gcc.target/arm/simd/vect-dot-u8.c: New > > -- diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b) #endif +/* Adv.SIMD Dot Product intrinsics. */ Please no full stop: "AdvSIMD". + +#pragma GCC push_options +#if __ARM_ARCH >= 8 +#pragma GCC target ("arch=armv8.2-a+dotprod") <snip> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h new file mode 100644 index 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h @@ -0,0 +1,15 @@ +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); + +__attribute__ ((noinline)) int +foo1(int len) { + int i; + TYPE int result = 0; + TYPE short prod; + + for (i=0; i<len; i++) { + prod = X[i] * Y[i]; + result += prod; + } + return result; +} \ No newline at end of file Please add new lines at the end of the new test files. This applies to a few more new files in this patch. Ok with these nits fixed. Thanks, Kyrill
On 22 November 2017 at 12:26, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > Hi Tamar, > > On 06/11/17 16:53, Tamar Christina wrote: >> >> Hi All, >> >> This patch adds the NEON intrinsics for Dot product. >> >> Dot product is available from ARMv8.2-a and onwards. >> >> Regtested on arm-none-eabi, armeb-none-eabi, >> aarch64-none-elf and aarch64_be-none-elf with no issues found. >> >> Ok for trunk? >> >> gcc/ >> 2017-11-06 Tamar Christina <tamar.christina@arm.com> >> >> * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32) > > > This should be config/arm/arm_neon.h > >> (vdot_s32, vdotq_s32): New. >> (vdot_lane_u32, vdotq_lane_u32): New. >> (vdot_lane_s32, vdotq_lane_s32): New. >> >> >> gcc/testsuite/ >> 2017-11-06 Tamar Christina <tamar.christina@arm.com> >> >> * gcc.target/arm/simd/vdot-compile.c: New. >> * gcc.target/arm/simd/vect-dot-qi.h: New. >> * gcc.target/arm/simd/vect-dot-s8.c: New. >> * gcc.target/arm/simd/vect-dot-u8.c: New >> >> -- > > > diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h > index > 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a > 100644 > --- a/gcc/config/arm/arm_neon.h > +++ b/gcc/config/arm/arm_neon.h > @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b) > #endif > +/* Adv.SIMD Dot Product intrinsics. */ > > Please no full stop: "AdvSIMD". > > + > +#pragma GCC push_options > +#if __ARM_ARCH >= 8 > +#pragma GCC target ("arch=armv8.2-a+dotprod") > > <snip> > Not sure if Kyrill actually meant to comment about the three lines above, but they have a bug: #if should be before #pragma GCC push_options. Indeed, after this patch was committed (r255064), I've noticed many regressions, for instance p64_p128 is now unsupported. This is because the arm_crypto_ok effective target now fails with this message: XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline 'vaeseq_u8': target specific option mismatch Not sure why this wasn't noticed in validations earlier? Fixed as obvious (r255126). Christophe > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > new file mode 100644 > index > 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > @@ -0,0 +1,15 @@ > +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > + > +__attribute__ ((noinline)) int > +foo1(int len) { > + int i; > + TYPE int result = 0; > + TYPE short prod; > + > + for (i=0; i<len; i++) { > + prod = X[i] * Y[i]; > + result += prod; > + } > + return result; > +} > \ No newline at end of file > > Please add new lines at the end of the new test files. > This applies to a few more new files in this patch. > > Ok with these nits fixed. > > Thanks, > Kyrill > 2017-11-24 Christophe Lyon <christophe.lyon@linaro.org> * config/arm/arm_neon.h: Fix pragma GCC push_options before vdot_u32. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3c9a8d9..d2e936c 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18036,8 +18036,8 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b) /* AdvSIMD Dot Product intrinsics. */ -#pragma GCC push_options #if __ARM_ARCH >= 8 +#pragma GCC push_options #pragma GCC target ("arch=armv8.2-a+dotprod") __extension__ extern __inline uint32x2_t
Hi Christophe, On 23/11/17 23:26, Christophe Lyon wrote: > On 22 November 2017 at 12:26, Kyrill Tkachov > <kyrylo.tkachov@foss.arm.com> wrote: >> Hi Tamar, >> >> On 06/11/17 16:53, Tamar Christina wrote: >>> Hi All, >>> >>> This patch adds the NEON intrinsics for Dot product. >>> >>> Dot product is available from ARMv8.2-a and onwards. >>> >>> Regtested on arm-none-eabi, armeb-none-eabi, >>> aarch64-none-elf and aarch64_be-none-elf with no issues found. >>> >>> Ok for trunk? >>> >>> gcc/ >>> 2017-11-06 Tamar Christina <tamar.christina@arm.com> >>> >>> * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32) >> >> This should be config/arm/arm_neon.h >> >>> (vdot_s32, vdotq_s32): New. >>> (vdot_lane_u32, vdotq_lane_u32): New. >>> (vdot_lane_s32, vdotq_lane_s32): New. >>> >>> >>> gcc/testsuite/ >>> 2017-11-06 Tamar Christina <tamar.christina@arm.com> >>> >>> * gcc.target/arm/simd/vdot-compile.c: New. >>> * gcc.target/arm/simd/vect-dot-qi.h: New. >>> * gcc.target/arm/simd/vect-dot-s8.c: New. >>> * gcc.target/arm/simd/vect-dot-u8.c: New >>> >>> -- >> >> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h >> index >> 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a >> 100644 >> --- a/gcc/config/arm/arm_neon.h >> +++ b/gcc/config/arm/arm_neon.h >> @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b) >> #endif >> +/* Adv.SIMD Dot Product intrinsics. */ >> >> Please no full stop: "AdvSIMD". >> >> + >> +#pragma GCC push_options >> +#if __ARM_ARCH >= 8 >> +#pragma GCC target ("arch=armv8.2-a+dotprod") >> >> <snip> >> > Not sure if Kyrill actually meant to comment about the three lines > above, but they have a bug: > #if should be before #pragma GCC push_options. You're right, sorry for missing this :( > Indeed, after this patch was committed (r255064), I've noticed many > regressions, for instance > p64_p128 is now unsupported. This is because the arm_crypto_ok > effective target now fails > with this message: > XXX/arm_neon.h:16911:1: error: inlining failed in call to > always_inline 'vaeseq_u8': target specific option mismatch > > Not sure why this wasn't noticed in validations earlier? > > Fixed as obvious (r255126). Thank you for fixing this up. Kyrill > Christophe > >> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> @@ -0,0 +1,15 @@ >> +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >> +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); >> + >> +__attribute__ ((noinline)) int >> +foo1(int len) { >> + int i; >> + TYPE int result = 0; >> + TYPE short prod; >> + >> + for (i=0; i<len; i++) { >> + prod = X[i] * Y[i]; >> + result += prod; >> + } >> + return result; >> +} >> \ No newline at end of file >> >> Please add new lines at the end of the new test files. >> This applies to a few more new files in this patch. >> >> Ok with these nits fixed. >> >> Thanks, >> Kyrill >>
> > > Not sure if Kyrill actually meant to comment about the three lines above, but > they have a bug: > #if should be before #pragma GCC push_options. > > Indeed, after this patch was committed (r255064), I've noticed many > regressions, for instance > p64_p128 is now unsupported. This is because the arm_crypto_ok effective > target now fails with this message: > XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline > 'vaeseq_u8': target specific option mismatch > > Not sure why this wasn't noticed in validations earlier? I still have the log files for these runs: It seems that I was comparing the log files instead of the sum files, which do not show this difference. /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.log ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log | grep p64_p128 /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.sum ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum | grep p64_p128 NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 execution test PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 (test for excess errors) NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 execution test PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 (test for excess errors) NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O2 Sorry for missing this, I don't even know why these scripts accept the log files if they're always going to do the wrong thing. Anyway thanks for fixing this and I'll make sure I'm using the sum files in the future. Tamar > > Fixed as obvious (r255126). > > Christophe > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 > 73 > > dc191cc71169 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > > @@ -0,0 +1,15 @@ > > +TYPE char X[N] __attribute__ > ((__aligned__(__BIGGEST_ALIGNMENT__))); > > +TYPE char Y[N] __attribute__ > ((__aligned__(__BIGGEST_ALIGNMENT__))); > > + > > +__attribute__ ((noinline)) int > > +foo1(int len) { > > + int i; > > + TYPE int result = 0; > > + TYPE short prod; > > + > > + for (i=0; i<len; i++) { > > + prod = X[i] * Y[i]; > > + result += prod; > > + } > > + return result; > > +} > > \ No newline at end of file > > > > Please add new lines at the end of the new test files. > > This applies to a few more new files in this patch. > > > > Ok with these nits fixed. > > > > Thanks, > > Kyrill > >
On 24 November 2017 at 11:31, Tamar Christina <Tamar.Christina@arm.com> wrote: >> > >> Not sure if Kyrill actually meant to comment about the three lines above, but >> they have a bug: >> #if should be before #pragma GCC push_options. >> >> Indeed, after this patch was committed (r255064), I've noticed many >> regressions, for instance >> p64_p128 is now unsupported. This is because the arm_crypto_ok effective >> target now fails with this message: >> XXX/arm_neon.h:16911:1: error: inlining failed in call to always_inline >> 'vaeseq_u8': target specific option mismatch >> >> Not sure why this wasn't noticed in validations earlier? > > I still have the log files for these runs: > > It seems that I was comparing the log files instead of the sum files, which do not show this difference. > > /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.log ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.log | grep p64_p128 > /d/t/g/s/gcc (dot-product-arm ↩☡=) contrib/dg-cmp-results.sh -v -v "" ../../build-arm-none-eabi/results.clean/vanilla/gcc.sum ../../build-arm-none-eabi/results.dotprod/vanilla/gcc.sum | grep p64_p128 > NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 > PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 execution test > PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O0 (test for excess errors) > NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 > PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 execution test > PASS->NA: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O1 (test for excess errors) > NA->UNSUPPORTED: gcc.target/aarch64/advsimd-intrinsics/p64_p128.c -O2 > > Sorry for missing this, I don't even know why these scripts accept the log files if they're always going to do the wrong thing. > > Anyway thanks for fixing this and I'll make sure I'm using the sum files in the future. > Thanks for checking why you missed it. That being said, I think there are a few more problems with your patch, but there is a lot of "noise" in the reports. After your commit, I have these reports: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html After my commit, I have these reports: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255126/report-build-info.html I haven't fully checked that my patch fixes all the regressions reported at r255064, but I don't see why my patch would introduce regressions.... So I think your patch is causing problems: * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3-d16-fp16 (the 2 "BIG-REGR" entries) where a few tests fail: (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test gcc.dg/vect/pr65947-14.c execution test (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): Executed from: gcc.dg/vect/vect.exp gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test gcc.dg/vect/pr51074.c execution test gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test gcc.dg/vect/pr65947-14.c execution test gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-nb-iter-ub-2.c execution test gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-nb-iter-ub-3.c execution test gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-strided-shift-1.c execution test gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-strided-u16-i3.c execution test Executed from: gcc.target/arm/arm.exp gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 loops" 1 (found 0 times) I haven't checked whether this tests were already failing before your patch, and are just reported as new failures because they failed to compile in the mean time. Not sure I am clear :-) Sorry for the delay and potentially hard to parse reports, I'm struggling with infrastructure problems. Thanks, Christophe > Tamar > >> >> Fixed as obvious (r255126). >> >> Christophe >> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> > new file mode 100644 >> > index >> > >> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 >> 73 >> > dc191cc71169 >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> > @@ -0,0 +1,15 @@ >> > +TYPE char X[N] __attribute__ >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >> > +TYPE char Y[N] __attribute__ >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >> > + >> > +__attribute__ ((noinline)) int >> > +foo1(int len) { >> > + int i; >> > + TYPE int result = 0; >> > + TYPE short prod; >> > + >> > + for (i=0; i<len; i++) { >> > + prod = X[i] * Y[i]; >> > + result += prod; >> > + } >> > + return result; >> > +} >> > \ No newline at end of file >> > >> > Please add new lines at the end of the new test files. >> > This applies to a few more new files in this patch. >> > >> > Ok with these nits fixed. >> > >> > Thanks, >> > Kyrill >> >
Hi Christophe, > > After your commit, I have these reports: > http://people.linaro.org/~christophe.lyon/cross- > validation/gcc/trunk/255064/report-build-info.html > > After my commit, I have these reports: > http://people.linaro.org/~christophe.lyon/cross- > validation/gcc/trunk/255126/report-build-info.html > > I haven't fully checked that my patch fixes all the regressions reported at > r255064, but I don't see why my patch would introduce regressions.... So I > think your patch is causing problems: > * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): > gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) > gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] > gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] > > * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16- > fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3- > d16-fp16 (the 2 "BIG-REGR" entries) This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header. gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test. I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊 However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather. Thanks, Tamar > where a few tests fail: > (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): > gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test > gcc.dg/vect/pr65947-14.c execution test > > (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): > Executed from: gcc.dg/vect/vect.exp > gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test > gcc.dg/vect/pr51074.c execution test > gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test > gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test > gcc.dg/vect/pr65947-14.c execution test > gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-nb-iter-ub-2.c execution test > gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-nb-iter-ub-3.c execution test > gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-strided-shift-1.c execution test > gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-strided-u16-i3.c execution test > Executed from: gcc.target/arm/arm.exp > gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) > gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] > gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] > gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 > gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 > gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 > loops" 1 (found 0 times) > > I haven't checked whether this tests were already failing before your patch, > and are just reported as new failures because they failed to compile in the > mean time. > > Not sure I am clear :-) > > Sorry for the delay and potentially hard to parse reports, I'm struggling with > infrastructure problems. > > Thanks, > > Christophe > > > Tamar > > > >> > >> Fixed as obvious (r255126). > >> > >> Christophe > >> > >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >> > new file mode 100644 > >> > index > >> > > >> > 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 > >> 73 > >> > dc191cc71169 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >> > @@ -0,0 +1,15 @@ > >> > +TYPE char X[N] __attribute__ > >> ((__aligned__(__BIGGEST_ALIGNMENT__))); > >> > +TYPE char Y[N] __attribute__ > >> ((__aligned__(__BIGGEST_ALIGNMENT__))); > >> > + > >> > +__attribute__ ((noinline)) int > >> > +foo1(int len) { > >> > + int i; > >> > + TYPE int result = 0; > >> > + TYPE short prod; > >> > + > >> > + for (i=0; i<len; i++) { > >> > + prod = X[i] * Y[i]; > >> > + result += prod; > >> > + } > >> > + return result; > >> > +} > >> > \ No newline at end of file > >> > > >> > Please add new lines at the end of the new test files. > >> > This applies to a few more new files in this patch. > >> > > >> > Ok with these nits fixed. > >> > > >> > Thanks, > >> > Kyrill > >> >
On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote: > Hi Christophe, > >> >> After your commit, I have these reports: >> http://people.linaro.org/~christophe.lyon/cross- >> validation/gcc/trunk/255064/report-build-info.html >> >> After my commit, I have these reports: >> http://people.linaro.org/~christophe.lyon/cross- >> validation/gcc/trunk/255126/report-build-info.html >> >> I haven't fully checked that my patch fixes all the regressions reported at >> r255064, but I don't see why my patch would introduce regressions.... So I >> think your patch is causing problems: >> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): >> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >> >> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16- >> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3- >> d16-fp16 (the 2 "BIG-REGR" entries) > > This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header. > > gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test. > > I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊 > However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather. Agreed. But note that many regressions are reported for the configurations --with-fpu vfpv3-d16-fp16 at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html Maybe that's just a matter of arm_neon.h being included by some effective-target tests? > > Thanks, > Tamar > >> where a few tests fail: >> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): >> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/pr65947-14.c execution test >> >> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): >> Executed from: gcc.dg/vect/vect.exp >> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/pr51074.c execution test >> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/pr65947-14.c execution test >> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-nb-iter-ub-2.c execution test >> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-nb-iter-ub-3.c execution test >> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-strided-shift-1.c execution test >> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-strided-u16-i3.c execution test >> Executed from: gcc.target/arm/arm.exp >> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 >> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 >> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 >> loops" 1 (found 0 times) >> >> I haven't checked whether this tests were already failing before your patch, >> and are just reported as new failures because they failed to compile in the >> mean time. >> >> Not sure I am clear :-) >> >> Sorry for the delay and potentially hard to parse reports, I'm struggling with >> infrastructure problems. >> >> Thanks, >> >> Christophe >> >> > Tamar >> > >> >> >> >> Fixed as obvious (r255126). >> >> >> >> Christophe >> >> >> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> >> > new file mode 100644 >> >> > index >> >> > >> >> >> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 >> >> 73 >> >> > dc191cc71169 >> >> > --- /dev/null >> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >> >> > @@ -0,0 +1,15 @@ >> >> > +TYPE char X[N] __attribute__ >> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >> >> > +TYPE char Y[N] __attribute__ >> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >> >> > + >> >> > +__attribute__ ((noinline)) int >> >> > +foo1(int len) { >> >> > + int i; >> >> > + TYPE int result = 0; >> >> > + TYPE short prod; >> >> > + >> >> > + for (i=0; i<len; i++) { >> >> > + prod = X[i] * Y[i]; >> >> > + result += prod; >> >> > + } >> >> > + return result; >> >> > +} >> >> > \ No newline at end of file >> >> > >> >> > Please add new lines at the end of the new test files. >> >> > This applies to a few more new files in this patch. >> >> > >> >> > Ok with these nits fixed. >> >> > >> >> > Thanks, >> >> > Kyrill >> >> >
On 24 November 2017 at 20:38, Christophe Lyon <christophe.lyon@linaro.org> wrote: > On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote: >> Hi Christophe, >> >>> >>> After your commit, I have these reports: >>> http://people.linaro.org/~christophe.lyon/cross- >>> validation/gcc/trunk/255064/report-build-info.html >>> >>> After my commit, I have these reports: >>> http://people.linaro.org/~christophe.lyon/cross- >>> validation/gcc/trunk/255126/report-build-info.html >>> >>> I haven't fully checked that my patch fixes all the regressions reported at >>> r255064, but I don't see why my patch would introduce regressions.... So I >>> think your patch is causing problems: >>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): >>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>> >>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16- >>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3- >>> d16-fp16 (the 2 "BIG-REGR" entries) >> >> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header. >> >> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test. >> >> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊 >> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather. > > Agreed. But note that many regressions are reported for the > configurations --with-fpu vfpv3-d16-fp16 > at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html > Maybe that's just a matter of arm_neon.h being included by some > effective-target tests? > > Hi Tamar, Good news, I have confirmed your obvious thoughts: I have run validations of r255063+your patch fixed, and the results are clean: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html I have also compared r255063 to r255216 (that is I applied all patches between yours and mine): http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html which confirms some regressions have been introduced in-between, hidden by the problem in your patch. Some may be obvious to bisect, some less. Christophe >> >> Thanks, >> Tamar >> >>> where a few tests fail: >>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): >>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/pr65947-14.c execution test >>> >>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): >>> Executed from: gcc.dg/vect/vect.exp >>> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/pr51074.c execution test >>> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/pr65947-14.c execution test >>> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-nb-iter-ub-2.c execution test >>> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-nb-iter-ub-3.c execution test >>> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-strided-shift-1.c execution test >>> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/vect-strided-u16-i3.c execution test >>> Executed from: gcc.target/arm/arm.exp >>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 >>> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 >>> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 >>> loops" 1 (found 0 times) >>> >>> I haven't checked whether this tests were already failing before your patch, >>> and are just reported as new failures because they failed to compile in the >>> mean time. >>> >>> Not sure I am clear :-) >>> >>> Sorry for the delay and potentially hard to parse reports, I'm struggling with >>> infrastructure problems. >>> >>> Thanks, >>> >>> Christophe >>> >>> > Tamar >>> > >>> >> >>> >> Fixed as obvious (r255126). >>> >> >>> >> Christophe >>> >> >>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>> >> > new file mode 100644 >>> >> > index >>> >> > >>> >> >>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 >>> >> 73 >>> >> > dc191cc71169 >>> >> > --- /dev/null >>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>> >> > @@ -0,0 +1,15 @@ >>> >> > +TYPE char X[N] __attribute__ >>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>> >> > +TYPE char Y[N] __attribute__ >>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>> >> > + >>> >> > +__attribute__ ((noinline)) int >>> >> > +foo1(int len) { >>> >> > + int i; >>> >> > + TYPE int result = 0; >>> >> > + TYPE short prod; >>> >> > + >>> >> > + for (i=0; i<len; i++) { >>> >> > + prod = X[i] * Y[i]; >>> >> > + result += prod; >>> >> > + } >>> >> > + return result; >>> >> > +} >>> >> > \ No newline at end of file >>> >> > >>> >> > Please add new lines at the end of the new test files. >>> >> > This applies to a few more new files in this patch. >>> >> > >>> >> > Ok with these nits fixed. >>> >> > >>> >> > Thanks, >>> >> > Kyrill >>> >> >
On 26 November 2017 at 13:56, Christophe Lyon <christophe.lyon@linaro.org> wrote: > On 24 November 2017 at 20:38, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote: >>> Hi Christophe, >>> >>>> >>>> After your commit, I have these reports: >>>> http://people.linaro.org/~christophe.lyon/cross- >>>> validation/gcc/trunk/255064/report-build-info.html >>>> >>>> After my commit, I have these reports: >>>> http://people.linaro.org/~christophe.lyon/cross- >>>> validation/gcc/trunk/255126/report-build-info.html >>>> >>>> I haven't fully checked that my patch fixes all the regressions reported at >>>> r255064, but I don't see why my patch would introduce regressions.... So I >>>> think your patch is causing problems: >>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): >>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>>> >>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16- >>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3- >>>> d16-fp16 (the 2 "BIG-REGR" entries) >>> >>> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header. >>> >>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test. >>> >>> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊 >>> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather. >> >> Agreed. But note that many regressions are reported for the >> configurations --with-fpu vfpv3-d16-fp16 >> at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html >> Maybe that's just a matter of arm_neon.h being included by some >> effective-target tests? >> >> > Hi Tamar, > > Good news, I have confirmed your obvious thoughts: I have run > validations of r255063+your patch fixed, and the results are clean: > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html > > I have also compared r255063 to r255216 (that is I applied all patches > between yours and mine): > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html > which confirms some regressions have been introduced in-between, > hidden by the problem in your patch. > > Some may be obvious to bisect, some less. > OK, so for gcc: FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of time" after r255103, which updated the test several failures for gcc.target/arm/addr-modes-float.c which was introduced at r255111 (Charles is aware of that, probably just a matter of adding the right effective-target) I'm still trying to reproduce the regression: FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test on armeb and for g++: g++.dg/ipa/devirt-22.C -std=gnu++11 scan-ipa-dump-times cp "Discovered a virtual call to a known target" 1 (found 2 times) g++.dg/ipa/devirt-22.C -std=gnu++14 scan-ipa-dump-times cp "Discovered a virtual call to a known target" 1 (found 2 times) g++.dg/ipa/devirt-22.C -std=gnu++98 scan-ipa-dump-times cp "Discovered a virtual call to a known target" 1 (found 2 times) g++.dg/pr79095-4.C -std=gnu++98 scan-tree-dump-times vrp2 "__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times) g++.dg/pr79095-4.C -std=gnu++98 (test for warnings, line ) Christophe > Christophe > >>> >>> Thanks, >>> Tamar >>> >>>> where a few tests fail: >>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): >>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/pr65947-14.c execution test >>>> >>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): >>>> Executed from: gcc.dg/vect/vect.exp >>>> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/pr51074.c execution test >>>> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/pr65947-14.c execution test >>>> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/vect-nb-iter-ub-2.c execution test >>>> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/vect-nb-iter-ub-3.c execution test >>>> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/vect-strided-shift-1.c execution test >>>> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test >>>> gcc.dg/vect/vect-strided-u16-i3.c execution test >>>> Executed from: gcc.target/arm/arm.exp >>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>>> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 >>>> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 >>>> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 >>>> loops" 1 (found 0 times) >>>> >>>> I haven't checked whether this tests were already failing before your patch, >>>> and are just reported as new failures because they failed to compile in the >>>> mean time. >>>> >>>> Not sure I am clear :-) >>>> >>>> Sorry for the delay and potentially hard to parse reports, I'm struggling with >>>> infrastructure problems. >>>> >>>> Thanks, >>>> >>>> Christophe >>>> >>>> > Tamar >>>> > >>>> >> >>>> >> Fixed as obvious (r255126). >>>> >> >>>> >> Christophe >>>> >> >>>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>> >> > new file mode 100644 >>>> >> > index >>>> >> > >>>> >> >>>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 >>>> >> 73 >>>> >> > dc191cc71169 >>>> >> > --- /dev/null >>>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>> >> > @@ -0,0 +1,15 @@ >>>> >> > +TYPE char X[N] __attribute__ >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>>> >> > +TYPE char Y[N] __attribute__ >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>>> >> > + >>>> >> > +__attribute__ ((noinline)) int >>>> >> > +foo1(int len) { >>>> >> > + int i; >>>> >> > + TYPE int result = 0; >>>> >> > + TYPE short prod; >>>> >> > + >>>> >> > + for (i=0; i<len; i++) { >>>> >> > + prod = X[i] * Y[i]; >>>> >> > + result += prod; >>>> >> > + } >>>> >> > + return result; >>>> >> > +} >>>> >> > \ No newline at end of file >>>> >> > >>>> >> > Please add new lines at the end of the new test files. >>>> >> > This applies to a few more new files in this patch. >>>> >> > >>>> >> > Ok with these nits fixed. >>>> >> > >>>> >> > Thanks, >>>> >> > Kyrill >>>> >> >
Hi Christophe, On 26/11/17 20:01, Christophe Lyon wrote: > On 26 November 2017 at 13:56, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> On 24 November 2017 at 20:38, Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >>> On 24 November 2017 at 19:05, Tamar Christina <Tamar.Christina@arm.com> wrote: >>>> Hi Christophe, >>>> >>>>> After your commit, I have these reports: >>>>> http://people.linaro.org/~christophe.lyon/cross- >>>>> validation/gcc/trunk/255064/report-build-info.html >>>>> >>>>> After my commit, I have these reports: >>>>> http://people.linaro.org/~christophe.lyon/cross- >>>>> validation/gcc/trunk/255126/report-build-info.html >>>>> >>>>> I haven't fully checked that my patch fixes all the regressions reported at >>>>> r255064, but I don't see why my patch would introduce regressions.... So I >>>>> think your patch is causing problems: >>>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): >>>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>>>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>>>> >>>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu vfpv3-d16- >>>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu vfpv3- >>>>> d16-fp16 (the 2 "BIG-REGR" entries) >>>> This patch only introduced a few neon instrinsics in arm_neon.h, and most of these files don't use the header. >>>> >>>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new test. >>>> >>>> I will run some regressions over the weekend on an updated tree, but I can't understand how a not included header it can cause execution failures 😊 >>>> However most of those are vectorizer tests. It seems much more likely to me that vectorization is broken rather. >>> Agreed. But note that many regressions are reported for the >>> configurations --with-fpu vfpv3-d16-fp16 >>> at: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/255064/report-build-info.html >>> Maybe that's just a matter of arm_neon.h being included by some >>> effective-target tests? >>> >>> >> Hi Tamar, >> >> Good news, I have confirmed your obvious thoughts: I have run >> validations of r255063+your patch fixed, and the results are clean: >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255064-fixed.patch/report-build-info.html >> >> I have also compared r255063 to r255216 (that is I applied all patches >> between yours and mine): >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/255063-r255063-255126.patch/report-build-info.html >> which confirms some regressions have been introduced in-between, >> hidden by the problem in your patch. >> >> Some may be obvious to bisect, some less. >> thank you very much for tracking these down. > OK, so for gcc: > FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of time" > after r255103, which updated the test Might be related to the various profile update cleanups that have been going on over the last few weeks. > several failures for gcc.target/arm/addr-modes-float.c which was > introduced at r255111 (Charles is aware of that, probably just a > matter of adding the right effective-target) I agree. > I'm still trying to reproduce the regression: > FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test > on armeb Hmm, maybe something to do with the check_vect check that these tests do? Or model flakiness... > > and for g++: > g++.dg/ipa/devirt-22.C -std=gnu++11 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/ipa/devirt-22.C -std=gnu++14 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/ipa/devirt-22.C -std=gnu++98 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/pr79095-4.C -std=gnu++98 scan-tree-dump-times vrp2 > "__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times) > g++.dg/pr79095-4.C -std=gnu++98 (test for warnings, line ) I'd guess these are related to the profile update improvements as well. Kyrill > Christophe > > >> Christophe >> >>>> Thanks, >>>> Tamar >>>> >>>>> where a few tests fail: >>>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): >>>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/pr65947-14.c execution test >>>>> >>>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): >>>>> Executed from: gcc.dg/vect/vect.exp >>>>> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/pr51074.c execution test >>>>> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/pr65947-14.c execution test >>>>> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/vect-nb-iter-ub-2.c execution test >>>>> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/vect-nb-iter-ub-3.c execution test >>>>> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/vect-strided-shift-1.c execution test >>>>> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test >>>>> gcc.dg/vect/vect-strided-u16-i3.c execution test >>>>> Executed from: gcc.target/arm/arm.exp >>>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 times) >>>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] >>>>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] >>>>> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 >>>>> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 >>>>> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect "vectorized 1 >>>>> loops" 1 (found 0 times) >>>>> >>>>> I haven't checked whether this tests were already failing before your patch, >>>>> and are just reported as new failures because they failed to compile in the >>>>> mean time. >>>>> >>>>> Not sure I am clear :-) >>>>> >>>>> Sorry for the delay and potentially hard to parse reports, I'm struggling with >>>>> infrastructure problems. >>>>> >>>>> Thanks, >>>>> >>>>> Christophe >>>>> >>>>>> Tamar >>>>>> >>>>>>> Fixed as obvious (r255126). >>>>>>> >>>>>>> Christophe >>>>>>> >>>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>>>>>> b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>>>>>> new file mode 100644 >>>>>>>> index >>>>>>>> >>>>> 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be176 >>>>>>> 73 >>>>>>>> dc191cc71169 >>>>>>>> --- /dev/null >>>>>>>> +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h >>>>>>>> @@ -0,0 +1,15 @@ >>>>>>>> +TYPE char X[N] __attribute__ >>>>>>> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>>>>>>> +TYPE char Y[N] __attribute__ >>>>>>> ((__aligned__(__BIGGEST_ALIGNMENT__))); >>>>>>>> + >>>>>>>> +__attribute__ ((noinline)) int >>>>>>>> +foo1(int len) { >>>>>>>> + int i; >>>>>>>> + TYPE int result = 0; >>>>>>>> + TYPE short prod; >>>>>>>> + >>>>>>>> + for (i=0; i<len; i++) { >>>>>>>> + prod = X[i] * Y[i]; >>>>>>>> + result += prod; >>>>>>>> + } >>>>>>>> + return result; >>>>>>>> +} >>>>>>>> \ No newline at end of file >>>>>>>> >>>>>>>> Please add new lines at the end of the new test files. >>>>>>>> This applies to a few more new files in this patch. >>>>>>>> >>>>>>>> Ok with these nits fixed. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kyrill >>>>>>>>
Hi Christoph, > -----Original Message----- > From: Christophe Lyon [mailto:christophe.lyon@linaro.org] > Sent: Sunday, November 26, 2017 20:01 > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; gcc-patches@gcc.gnu.org; > nd <nd@arm.com>; Ramana Radhakrishnan > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; nickc@redhat.com > Subject: Re: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)] > > On 26 November 2017 at 13:56, Christophe Lyon <christophe.lyon@linaro.org> > wrote: > > On 24 November 2017 at 20:38, Christophe Lyon > > <christophe.lyon@linaro.org> wrote: > >> On 24 November 2017 at 19:05, Tamar Christina > <Tamar.Christina@arm.com> wrote: > >>> Hi Christophe, > >>> > >>>> > >>>> After your commit, I have these reports: > >>>> http://people.linaro.org/~christophe.lyon/cross- > >>>> validation/gcc/trunk/255064/report-build-info.html > >>>> > >>>> After my commit, I have these reports: > >>>> http://people.linaro.org/~christophe.lyon/cross- > >>>> validation/gcc/trunk/255126/report-build-info.html > >>>> > >>>> I haven't fully checked that my patch fixes all the regressions > >>>> reported at r255064, but I don't see why my patch would introduce > >>>> regressions.... So I think your patch is causing problems: > >>>> * on armeb --with-fpu=neon-fp16: (the 2 "REGRESSED" entries): > >>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 > times) > >>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] > >>>> gcc.target/arm/neon-vfms-1.c scan-assembler > >>>> vfms\\.f32[\t]+[dDqQ] > >>>> > >>>> * on arm-none-linux-gnueabihf --with-cpu cortex-a5 --with-fpu > >>>> vfpv3-d16- > >>>> fp16 and armeb-none-linux-gnueabihf --with-cpu cortex-a9 --with-fpu > >>>> vfpv3- > >>>> d16-fp16 (the 2 "BIG-REGR" entries) > >>> > >>> This patch only introduced a few neon instrinsics in arm_neon.h, and > most of these files don't use the header. > >>> > >>> gcc.dg/vect/pr65947-14.c doesn't exist in my tree so it's a relatively new > test. > >>> > >>> I will run some regressions over the weekend on an updated tree, but > >>> I can't understand how a not included header it can cause execution > >>> failures 😊 > >>> However most of those are vectorizer tests. It seems much more likely > to me that vectorization is broken rather. > >> > >> Agreed. But note that many regressions are reported for the > >> configurations --with-fpu vfpv3-d16-fp16 > >> at: > >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/ > >> 255064/report-build-info.html Maybe that's just a matter of > >> arm_neon.h being included by some effective-target tests? > >> > >> > > Hi Tamar, > > > > Good news, I have confirmed your obvious thoughts: I have run > > validations of r255063+your patch fixed, and the results are clean: > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-pa > > tches/255063-r255064-fixed.patch/report-build-info.html Thanks for confirming! My own finished as well. Sorry again for the breakage, I've Updated my scripts to exclude log files so this shouldn't happen again!. > > > > I have also compared r255063 to r255216 (that is I applied all patches > > between yours and mine): > > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-pa > > tches/255063-r255063-255126.patch/report-build-info.html > > which confirms some regressions have been introduced in-between, > > hidden by the problem in your patch. > > > > Some may be obvious to bisect, some less. > > > OK, so for gcc: > FAIL: gcc.dg/ipa/inline-1.c scan-ipa-dump inline "op2 change 9.990000. of > time" > after r255103, which updated the test > > several failures for gcc.target/arm/addr-modes-float.c which was introduced > at r255111 (Charles is aware of that, probably just a matter of adding the right > effective-target) > > I'm still trying to reproduce the regression: > FAIL: gcc.dg/vect/vect-nb-iter-ub-2.c execution test on armeb > > and for g++: > g++.dg/ipa/devirt-22.C -std=gnu++11 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/ipa/devirt-22.C -std=gnu++14 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/ipa/devirt-22.C -std=gnu++98 scan-ipa-dump-times cp > "Discovered a virtual call to a known target" 1 (found 2 times) > g++.dg/pr79095-4.C -std=gnu++98 scan-tree-dump-times vrp2 > "__builtin_memset \\(_[0-9]+, 0, [0-9]+\\)" 1 (found 0 times) > g++.dg/pr79095-4.C -std=gnu++98 (test for warnings, line ) > > Christophe > > > > Christophe > > > >>> > >>> Thanks, > >>> Tamar > >>> > >>>> where a few tests fail: > >>>> (arm-none-linux-gnueabihf cortex-a5 vfpv3-d16-fp16): > >>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/pr65947-14.c execution test > >>>> > >>>> (armeb-none-linux-gnueabihf cortex-a9 vfpv3-d16-fp16): > >>>> Executed from: gcc.dg/vect/vect.exp > >>>> gcc.dg/vect/pr51074.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/pr51074.c execution test > >>>> gcc.dg/vect/pr64252.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/pr65947-14.c execution test > >>>> gcc.dg/vect/vect-cond-4.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/vect-nb-iter-ub-2.c execution test > >>>> gcc.dg/vect/vect-nb-iter-ub-3.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/vect-nb-iter-ub-3.c execution test > >>>> gcc.dg/vect/vect-strided-shift-1.c -flto -ffat-lto-objects execution > test > >>>> gcc.dg/vect/vect-strided-shift-1.c execution test > >>>> gcc.dg/vect/vect-strided-u16-i3.c -flto -ffat-lto-objects execution test > >>>> gcc.dg/vect/vect-strided-u16-i3.c execution test > >>>> Executed from: gcc.target/arm/arm.exp > >>>> gcc.target/arm/attr-neon3.c scan-assembler-times vld1 1 (found 2 > times) > >>>> gcc.target/arm/neon-vfma-1.c scan-assembler vfma\\.f32[\t]+[dDqQ] > >>>> gcc.target/arm/neon-vfms-1.c scan-assembler vfms\\.f32[\t]+[dDqQ] > >>>> gcc.target/arm/neon-vmla-1.c scan-assembler vmla\\.i32 > >>>> gcc.target/arm/neon-vmls-1.c scan-assembler vmls\\.i32 > >>>> gcc.target/arm/vect-copysignf.c scan-tree-dump-times vect > >>>> "vectorized 1 loops" 1 (found 0 times) > >>>> > >>>> I haven't checked whether this tests were already failing before > >>>> your patch, and are just reported as new failures because they > >>>> failed to compile in the mean time. > >>>> > >>>> Not sure I am clear :-) > >>>> > >>>> Sorry for the delay and potentially hard to parse reports, I'm > >>>> struggling with infrastructure problems. > >>>> > >>>> Thanks, > >>>> > >>>> Christophe > >>>> > >>>> > Tamar > >>>> > > >>>> >> > >>>> >> Fixed as obvious (r255126). > >>>> >> > >>>> >> Christophe > >>>> >> > >>>> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >>>> >> > b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >>>> >> > new file mode 100644 > >>>> >> > index > >>>> >> > > >>>> >> > >>>> > 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17 > >>>> 6 > >>>> >> 73 > >>>> >> > dc191cc71169 > >>>> >> > --- /dev/null > >>>> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h > >>>> >> > @@ -0,0 +1,15 @@ > >>>> >> > +TYPE char X[N] __attribute__ > >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); > >>>> >> > +TYPE char Y[N] __attribute__ > >>>> >> ((__aligned__(__BIGGEST_ALIGNMENT__))); > >>>> >> > + > >>>> >> > +__attribute__ ((noinline)) int foo1(int len) { > >>>> >> > + int i; > >>>> >> > + TYPE int result = 0; > >>>> >> > + TYPE short prod; > >>>> >> > + > >>>> >> > + for (i=0; i<len; i++) { > >>>> >> > + prod = X[i] * Y[i]; > >>>> >> > + result += prod; > >>>> >> > + } > >>>> >> > + return result; > >>>> >> > +} > >>>> >> > \ No newline at end of file > >>>> >> > > >>>> >> > Please add new lines at the end of the new test files. > >>>> >> > This applies to a few more new files in this patch. > >>>> >> > > >>>> >> > Ok with these nits fixed. > >>>> >> > > >>>> >> > Thanks, > >>>> >> > Kyrill > >>>> >> >
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 0d436e83d0f01f0c86f8d6a25f84466c841c7e11..419080417901f343737741e334cbff818bb1e70a 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18034,6 +18034,72 @@ vzipq_f16 (float16x8_t __a, float16x8_t __b) #endif +/* Adv.SIMD Dot Product intrinsics. */ + +#pragma GCC push_options +#if __ARM_ARCH >= 8 +#pragma GCC target ("arch=armv8.2-a+dotprod") + +__extension__ extern __inline uint32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b) +{ + return __builtin_neon_udotv8qi_uuuu (__r, __a, __b); +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b) +{ + return __builtin_neon_udotv16qi_uuuu (__r, __a, __b); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b) +{ + return __builtin_neon_sdotv8qi (__r, __a, __b); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b) +{ + return __builtin_neon_sdotv16qi (__r, __a, __b); +} + +__extension__ extern __inline uint32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_lane_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b, const int __index) +{ + return __builtin_neon_udot_lanev8qi_uuuus (__r, __a, __b, __index); +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_lane_u32 (uint32x4_t __r, uint8x16_t __a, uint8x8_t __b, + const int __index) +{ + return __builtin_neon_udot_lanev16qi_uuuus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdot_lane_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b, const int __index) +{ + return __builtin_neon_sdot_lanev8qi (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vdotq_lane_s32 (int32x4_t __r, int8x16_t __a, int8x8_t __b, const int __index) +{ + return __builtin_neon_sdot_lanev16qi (__r, __a, __b, __index); +} + +#pragma GCC pop_options +#endif + #ifdef __cplusplus } #endif diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c new file mode 100644 index 0000000000000000000000000000000000000000..a422384b0a0140d4afb4ff4a04223dd20f8d9960 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include <arm_neon.h> + +/* Unsigned Dot Product instructions. */ + +uint32x2_t ufoo (uint32x2_t r, uint8x8_t x, uint8x8_t y) +{ + return vdot_u32 (r, x, y); +} + +uint32x4_t ufooq (uint32x4_t r, uint8x16_t x, uint8x16_t y) +{ + return vdotq_u32 (r, x, y); +} + +uint32x2_t ufoo_lane (uint32x2_t r, uint8x8_t x, uint8x8_t y) +{ + return vdot_lane_u32 (r, x, y, 0); +} + +uint32x4_t ufooq_lane (uint32x4_t r, uint8x16_t x, uint8x8_t y) +{ + return vdotq_lane_u32 (r, x, y, 0); +} + +/* Signed Dot Product instructions. */ + +int32x2_t sfoo (int32x2_t r, int8x8_t x, int8x8_t y) +{ + return vdot_s32 (r, x, y); +} + +int32x4_t sfooq (int32x4_t r, int8x16_t x, int8x16_t y) +{ + return vdotq_s32 (r, x, y); +} + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, int8x8_t y) +{ + return vdot_lane_s32 (r, x, y, 0); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, int8x8_t y) +{ + return vdotq_lane_s32 (r, x, y, 0); +} + +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\td[0-9]+, d[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */ +/* { dg-final { scan-assembler-times {v[us]dot\.[us]8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?[0-9]\]} 2 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h new file mode 100644 index 0000000000000000000000000000000000000000..90b00aff95cfef96d1963be17673dc191cc71169 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-qi.h @@ -0,0 +1,15 @@ +TYPE char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); +TYPE char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); + +__attribute__ ((noinline)) int +foo1(int len) { + int i; + TYPE int result = 0; + TYPE short prod; + + for (i=0; i<len; i++) { + prod = X[i] * Y[i]; + result += prod; + } + return result; +} \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c b/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c new file mode 100644 index 0000000000000000000000000000000000000000..6593404a682f76c8adce6b34de8ec4a2d0d97feb --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-s8.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#define N 64 +#define TYPE signed + +#include "vect-dot-qi.h" + +/* { dg-final { scan-assembler-times {vsdot\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 4 } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c b/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c new file mode 100644 index 0000000000000000000000000000000000000000..c4d191ee827268f267c23427aa51101efbaeff38 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vect-dot-u8.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_ok } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#define N 64 +#define TYPE unsigned + +#include "vect-dot-qi.h" + +/* { dg-final { scan-assembler-times {vudot\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 4 } } */ \ No newline at end of file