Message ID | CAMZc-bxLKHo0V=RXVu1KOv9AggFBKWWXX8zK07n5qDeBg0xrrA@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908] | expand |
ping On Tue, Apr 27, 2021 at 5:58 PM Hongtao Liu <crazylht@gmail.com> wrote: > > Hi: > As described in the subject line, this patch is about to do the > below transformation. > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > - vpandn %ymm3, %ymm2, %ymm2 > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > gcc/ChangeLog: > > PR target/99908 > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > splitters for pblendvb of NOT mask register. > > gcc/testsuite/ChangeLog: > > PR target/99908 > * gcc.target/i386/avx2-pr99908.c: New test. > * gcc.target/i386/sse4_1-pr99908.c: New test. > > -- > BR, > Hongtao
On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi: > As described in the subject line, this patch is about to do the > below transformation. > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > - vpandn %ymm3, %ymm2, %ymm2 > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > gcc/ChangeLog: > > PR target/99908 > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > splitters for pblendvb of NOT mask register. > > gcc/testsuite/ChangeLog: > > PR target/99908 > * gcc.target/i386/avx2-pr99908.c: New test. > * gcc.target/i386/sse4_1-pr99908.c: New test. OK. Thanks, Uros.
On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi: > > As described in the subject line, this patch is about to do the > > below transformation. > > > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > > - vpandn %ymm3, %ymm2, %ymm2 > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > gcc/ChangeLog: > > > > PR target/99908 > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > > splitters for pblendvb of NOT mask register. > > > > gcc/testsuite/ChangeLog: > > > > PR target/99908 > > * gcc.target/i386/avx2-pr99908.c: New test. > > * gcc.target/i386/sse4_1-pr99908.c: New test. > > OK. > > Thanks, > Uros. Thanks for the review.
On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi: > > > As described in the subject line, this patch is about to do the > > > below transformation. > > > > > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > > > - vpandn %ymm3, %ymm2, %ymm2 > > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > > > gcc/ChangeLog: > > > > > > PR target/99908 > > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > > > splitters for pblendvb of NOT mask register. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/99908 > > > * gcc.target/i386/avx2-pr99908.c: New test. > > > * gcc.target/i386/sse4_1-pr99908.c: New test. > > Thanks for the review. OTOH, have you considered ix86_fold_builtinor ix86_gimple_fold_builtin? These functions are implemented as builtins, so perhaps the transformation can be more efficiently implemented by calling these two target functions. Uros.
On Wed, May 12, 2021 at 8:38 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Hi: > > > > As described in the subject line, this patch is about to do the > > > > below transformation. > > > > > > > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > > > > - vpandn %ymm3, %ymm2, %ymm2 > > > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > > > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/99908 > > > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > > > > splitters for pblendvb of NOT mask register. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR target/99908 > > > > * gcc.target/i386/avx2-pr99908.c: New test. > > > > * gcc.target/i386/sse4_1-pr99908.c: New test. > > > > Thanks for the review. > > OTOH, have you considered ix86_fold_builtinor > ix86_gimple_fold_builtin? These functions are implemented as builtins, > so perhaps the transformation can be more efficiently implemented by > calling these two target functions. Good idea, I'll try that. > > Uros.
On Thu, May 13, 2021 at 8:43 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, May 12, 2021 at 8:38 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, May 12, 2021 at 1:42 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Wed, May 12, 2021 at 4:36 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Tue, Apr 27, 2021 at 1:05 PM Hongtao Liu via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > Hi: > > > > > As described in the subject line, this patch is about to do the > > > > > below transformation. > > > > > > > > > > - vpcmpeqd %ymm3, %ymm3, %ymm3 > > > > > - vpandn %ymm3, %ymm2, %ymm2 > > > > > - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 > > > > > + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 > > > > > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR target/99908 > > > > > * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add > > > > > splitters for pblendvb of NOT mask register. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > PR target/99908 > > > > > * gcc.target/i386/avx2-pr99908.c: New test. > > > > > * gcc.target/i386/sse4_1-pr99908.c: New test. > > > > > > Thanks for the review. > > > > OTOH, have you considered ix86_fold_builtinor > > ix86_gimple_fold_builtin? These functions are implemented as builtins, > > so perhaps the transformation can be more efficiently implemented by > > calling these two target functions. > Good idea, I'll try that. I find it's not that good to fold andn to 2 gimple IRs which don't always come back to andn in rtl, and lose some opt. But blendv folding seems to be obviously good. > > > > Uros. > > > > -- > BR, > Hongtao
From e1daa651d201f9ab3a85a80a635746fcf4be70ab Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Wed, 7 Apr 2021 09:58:54 +0800 Subject: [PATCH] i386: Optimize vpblendvb on inverted mask register to vpblendvb on swapping the order of operand 1 and operand 2. [PR target/99908] - vpcmpeqd %ymm3, %ymm3, %ymm3 - vpandn %ymm3, %ymm2, %ymm2 - vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 + vpblendvb %ymm2, %ymm0, %ymm1, %ymm0 gcc/ChangeLog: PR target/99908 * config/i386/sse.md (<sse4_1_avx2>_pblendvb): Add splitters for pblendvb of NOT mask register. gcc/testsuite/ChangeLog: PR target/99908 * gcc.target/i386/avx2-pr99908.c: New test. * gcc.target/i386/sse4_1-pr99908.c: New test. --- gcc/config/i386/sse.md | 29 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/avx2-pr99908.c | 25 ++++++++++++++++ .../gcc.target/i386/sse4_1-pr99908.c | 23 +++++++++++++++ 3 files changed, 77 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr99908.c create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 897cf3eaea9..4ef22b428e4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17735,6 +17735,35 @@ (define_insn "<sse4_1_avx2>_pblendvb" (set_attr "btver2_decode" "vector,vector,vector") (set_attr "mode" "<sseinsnmode>")]) +(define_split + [(set (match_operand:VI1_AVX2 0 "register_operand") + (unspec:VI1_AVX2 + [(match_operand:VI1_AVX2 1 "vector_operand") + (match_operand:VI1_AVX2 2 "register_operand") + (not:VI1_AVX2 (match_operand:VI1_AVX2 3 "register_operand"))] + UNSPEC_BLENDV))] + "TARGET_SSE4_1" + [(set (match_dup 0) + (unspec:VI1_AVX2 + [(match_dup 2) (match_dup 1) (match_dup 3)] + UNSPEC_BLENDV))]) + +(define_split + [(set (match_operand:VI1_AVX2 0 "register_operand") + (unspec:VI1_AVX2 + [(match_operand:VI1_AVX2 1 "vector_operand") + (match_operand:VI1_AVX2 2 "register_operand") + (subreg:VI1_AVX2 (not (match_operand 3 "register_operand")) 0)] + UNSPEC_BLENDV))] + "TARGET_SSE4_1 + && GET_MODE_CLASS (GET_MODE (operands[3])) == MODE_VECTOR_INT + && GET_MODE_SIZE (GET_MODE (operands[3])) == <MODE_SIZE>" + [(set (match_dup 0) + (unspec:VI1_AVX2 + [(match_dup 2) (match_dup 1) (match_dup 4)] + UNSPEC_BLENDV))] + "operands[4] = gen_lowpart (<MODE>mode, operands[3]);") + (define_insn_and_split "*<sse4_1_avx2>_pblendvb_lt" [(set (match_operand:VI1_AVX2 0 "register_operand" "=Yr,*x,x") (unspec:VI1_AVX2 diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr99908.c b/gcc/testsuite/gcc.target/i386/avx2-pr99908.c new file mode 100644 index 00000000000..2775f3b50f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr99908.c @@ -0,0 +1,25 @@ +/* PR target/99908 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2 -masm=att" } */ +/* { dg-final { scan-assembler-times "\tvpblendvb\t" 2 } } */ +/* { dg-final { scan-assembler-not "\tvpcmpeq" } } */ +/* { dg-final { scan-assembler-not "\tvpandn" } } */ + +#include <x86intrin.h> + +__m256i +f1 (__m256i a, __m256i b, __m256i mask) +{ + return _mm256_blendv_epi8(a, b, + _mm256_andnot_si256(mask, _mm256_set1_epi8(255))); +} + +__m256i +f2 (__v32qi x, __v32qi a, __v32qi b) +{ + x ^= (__v32qi) { -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1 }; + return _mm256_blendv_epi8 ((__m256i) a, (__m256i) b, (__m256i) x); +} diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c b/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c new file mode 100644 index 00000000000..c13e730b220 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c @@ -0,0 +1,23 @@ +/* PR target/99908 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1 -mno-avx -masm=att" } */ +/* { dg-final { scan-assembler-times "\tpblendvb\t" 2 } } */ +/* { dg-final { scan-assembler-not "\tpcmpeq" } } */ +/* { dg-final { scan-assembler-not "\tpandn" } } */ + +#include <x86intrin.h> + +__m128i +f1 (__m128i a, __m128i b, __m128i mask) +{ + return _mm_blendv_epi8(a, b, + _mm_andnot_si128(mask, _mm_set1_epi8(255))); +} + +__m128i +f2 (__v16qi x, __v16qi a, __v16qi b) +{ + x ^= (__v16qi) { -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1 }; + return _mm_blendv_epi8 ((__m128i) a, (__m128i) b, (__m128i) x); +} -- 2.18.1