Message ID | CAMZc-bwiOhXM08E-FxUFjSiJREze2tQOL2K9ovz3XQYPF_PZ7g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [i386] Split not+broadcast+pand to broadcast+pandn. | expand |
On Mon, May 24, 2021 at 11:03 PM Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi: > This patch is about to do transformation like below. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > from > notl %edi > vpbroadcastd %edi, %xmm0 > vpand %xmm1, %xmm0, %xmm0 > to > vpbroadcastd %edi, %xmm0 > vpandn %xmm1, %xmm0, %xmm0 > > gcc/ChangeLog: > > PR target/100711 > * config/i386/sse.md (*andnot<mode>3): New combine splitter > after it. > > gcc/testsuite/ChangeLog: > > PR target/100711 > * gcc.target/i386/avx2-pr100711.c: New test. > * gcc.target/i386/avx512bw-pr100711.c: New test. > Does it make sense to make this more generic and have combine/simplify rtx instead try: (vec_dup (not)) to (not (vec_dup)) Thanks, Andrew Pinski
On Tue, May 25, 2021 at 2:11 PM Andrew Pinski <pinskia@gmail.com> wrote: > > On Mon, May 24, 2021 at 11:03 PM Hongtao Liu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi: > > This patch is about to do transformation like below. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > from > > notl %edi > > vpbroadcastd %edi, %xmm0 > > vpand %xmm1, %xmm0, %xmm0 > > to > > vpbroadcastd %edi, %xmm0 > > vpandn %xmm1, %xmm0, %xmm0 > > > > gcc/ChangeLog: > > > > PR target/100711 > > * config/i386/sse.md (*andnot<mode>3): New combine splitter > > after it. > > > > gcc/testsuite/ChangeLog: > > > > PR target/100711 > > * gcc.target/i386/avx2-pr100711.c: New test. > > * gcc.target/i386/avx512bw-pr100711.c: New test. > > > > > Does it make sense to make this more generic and have combine/simplify > rtx instead try: > (vec_dup (not)) to (not (vec_dup)) Even w/ that, a combine splitter is still needed since we don't have any pandn patterns which contain op1 as vec_duplicate or "not" pattern for vector mode, generic simplification only helps combine/forprop to match more possibilities, but not split pattern by itself. > > Thanks, > Andrew Pinski
On Mon, May 24, 2021 at 11:23 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Tue, May 25, 2021 at 2:11 PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > On Mon, May 24, 2021 at 11:03 PM Hongtao Liu via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi: > > > This patch is about to do transformation like below. > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > from > > > notl %edi > > > vpbroadcastd %edi, %xmm0 > > > vpand %xmm1, %xmm0, %xmm0 > > > to > > > vpbroadcastd %edi, %xmm0 > > > vpandn %xmm1, %xmm0, %xmm0 > > > > > > gcc/ChangeLog: > > > > > > PR target/100711 > > > * config/i386/sse.md (*andnot<mode>3): New combine splitter > > > after it. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/100711 > > > * gcc.target/i386/avx2-pr100711.c: New test. > > > * gcc.target/i386/avx512bw-pr100711.c: New test. > > > > > > > > > Does it make sense to make this more generic and have combine/simplify > > rtx instead try: > > (vec_dup (not)) to (not (vec_dup)) > > Even w/ that, a combine splitter is still needed since we don't have > any pandn patterns which contain op1 as vec_duplicate or "not" pattern > for vector mode, generic simplification only helps combine/forprop to > match more possibilities, but not split pattern by itself. Huh? This is a 3->2 combining which definitely just happen. You don't need a "not" pattern for the vector mode for this to happen since the combining happens without an insn defined. Thanks, Andrew Pinski > > > > > Thanks, > > Andrew Pinski > > > > -- > BR, > Hongtao
On Tue, May 25, 2021 at 2:29 PM Andrew Pinski <pinskia@gmail.com> wrote: > > On Mon, May 24, 2021 at 11:23 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Tue, May 25, 2021 at 2:11 PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > > > On Mon, May 24, 2021 at 11:03 PM Hongtao Liu via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Hi: > > > > This patch is about to do transformation like below. > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > > > Ok for trunk? > > > > > > > > from > > > > notl %edi > > > > vpbroadcastd %edi, %xmm0 > > > > vpand %xmm1, %xmm0, %xmm0 > > > > to > > > > vpbroadcastd %edi, %xmm0 > > > > vpandn %xmm1, %xmm0, %xmm0 > > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/100711 > > > > * config/i386/sse.md (*andnot<mode>3): New combine splitter > > > > after it. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR target/100711 > > > > * gcc.target/i386/avx2-pr100711.c: New test. > > > > * gcc.target/i386/avx512bw-pr100711.c: New test. > > > > > > > > > > > > > Does it make sense to make this more generic and have combine/simplify > > > rtx instead try: > > > (vec_dup (not)) to (not (vec_dup)) > > > > Even w/ that, a combine splitter is still needed since we don't have > > any pandn patterns which contain op1 as vec_duplicate or "not" pattern > > for vector mode, generic simplification only helps combine/forprop to > > match more possibilities, but not split pattern by itself. > > Huh? This is a 3->2 combining which definitely just happen. Oh, I don't know that, thanks for the classification, I thought combine only does n->1. > You don't need a "not" pattern for the vector mode for this to happen > since the combining happens without an insn defined. > > Thanks, > Andrew Pinski > > > > > > > > > Thanks, > > > Andrew Pinski > > > > > > > > -- > > BR, > > Hongtao
Update patch: The new patch simplify (vec_duplicate (not (nonimmedaite_operand))) to (not (vec_duplicate (nonimmedaite_operand))). This is not a straightforward simplification, just adding some tendency to pull not out of vec_duplicate. For i386, it will enable below opt from notl %edi vpbroadcastd %edi, %xmm0 vpand %xmm1, %xmm0, %xmm0 to vpbroadcastd %edi, %xmm0 vpandn %xmm1, %xmm0, %xmm0 Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/100711 * simplify-rtx.c (simplify_unary_operation_1): Simplify (vec_duplicate (not (nonimmedaite_operand))) to (not (vec_duplicate (nonimmedaite_operand))). gcc/testsuite/ChangeLog: PR target/100711 * gcc.target/i386/avx2-pr100711.c: New test. * gcc.target/i386/avx512bw-pr100711.c: New test.
On Tue, May 25, 2021 at 6:17 PM Hongtao Liu <crazylht@gmail.com> wrote: > > Update patch: > The new patch simplify (vec_duplicate (not (nonimmedaite_operand))) > to (not (vec_duplicate (nonimmedaite_operand))). This is not a > straightforward simplification, just adding some tendency to pull not > out of vec_duplicate. > > For i386, it will enable below opt > > from > notl %edi > vpbroadcastd %edi, %xmm0 > vpand %xmm1, %xmm0, %xmm0 > to > vpbroadcastd %edi, %xmm0 > vpandn %xmm1, %xmm0, %xmm0 > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > gcc/ChangeLog: > > PR target/100711 > * simplify-rtx.c (simplify_unary_operation_1): > Simplify (vec_duplicate (not (nonimmedaite_operand))) > to (not (vec_duplicate (nonimmedaite_operand))). > > gcc/testsuite/ChangeLog: > > PR target/100711 > * gcc.target/i386/avx2-pr100711.c: New test. > * gcc.target/i386/avx512bw-pr100711.c: New test. This patch should not use nonimmedaite_operand at all in simplify-rtx.c. Rather use !CONSTANT_P (XEXP (op, 0)) instead. And even then (not CONST_INT) will never be there anyways as it will always be simplified to a constant in the first place. So removing that check is fine. Thanks, Andrew
On Wed, May 26, 2021 at 12:12 PM Andrew Pinski <pinskia@gmail.com> wrote: > > On Tue, May 25, 2021 at 6:17 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > Update patch: > > The new patch simplify (vec_duplicate (not (nonimmedaite_operand))) > > to (not (vec_duplicate (nonimmedaite_operand))). This is not a > > straightforward simplification, just adding some tendency to pull not > > out of vec_duplicate. > > > > For i386, it will enable below opt > > > > from > > notl %edi > > vpbroadcastd %edi, %xmm0 > > vpand %xmm1, %xmm0, %xmm0 > > to > > vpbroadcastd %edi, %xmm0 > > vpandn %xmm1, %xmm0, %xmm0 > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > gcc/ChangeLog: > > > > PR target/100711 > > * simplify-rtx.c (simplify_unary_operation_1): > > Simplify (vec_duplicate (not (nonimmedaite_operand))) > > to (not (vec_duplicate (nonimmedaite_operand))). > > > > gcc/testsuite/ChangeLog: > > > > PR target/100711 > > * gcc.target/i386/avx2-pr100711.c: New test. > > * gcc.target/i386/avx512bw-pr100711.c: New test. > > This patch should not use nonimmedaite_operand at all in There's no simplification opportunity for nonimmediate_operand, but I'm not sure for other cases(not constants). Reading from codes in case NOT of simplify_unary_operation_1, there may be (vec_duplicate (not (plus X - 1))??? > simplify-rtx.c. Rather use !CONSTANT_P (XEXP (op, 0)) instead. > And even then (not CONST_INT) will never be there anyways as it will > always be simplified to a constant in the first place. So removing > that check is fine. > > Thanks, > Andrew
On Wed, May 26, 2021 at 1:17 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, May 26, 2021 at 12:12 PM Andrew Pinski <pinskia@gmail.com> wrote: > > > > On Tue, May 25, 2021 at 6:17 PM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > Update patch: > > > The new patch simplify (vec_duplicate (not (nonimmedaite_operand))) > > > to (not (vec_duplicate (nonimmedaite_operand))). This is not a > > > straightforward simplification, just adding some tendency to pull not > > > out of vec_duplicate. > > > > > > For i386, it will enable below opt > > > > > > from > > > notl %edi > > > vpbroadcastd %edi, %xmm0 > > > vpand %xmm1, %xmm0, %xmm0 > > > to > > > vpbroadcastd %edi, %xmm0 > > > vpandn %xmm1, %xmm0, %xmm0 > > > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > > Ok for trunk? > > > gcc/ChangeLog: > > > > > > PR target/100711 > > > * simplify-rtx.c (simplify_unary_operation_1): > > > Simplify (vec_duplicate (not (nonimmedaite_operand))) > > > to (not (vec_duplicate (nonimmedaite_operand))). > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/100711 > > > * gcc.target/i386/avx2-pr100711.c: New test. > > > * gcc.target/i386/avx512bw-pr100711.c: New test. > > > > This patch should not use nonimmedaite_operand at all in > There's no simplification opportunity for nonimmediate_operand, but > I'm not sure for other cases(not constants). > Reading from codes in case NOT of simplify_unary_operation_1, there > may be (vec_duplicate (not (plus X - 1))??? After reconsidering, I think you're right, (not op) will be simplified in the first place, so the updated patch just pulls not out of vec_duplicate. > > > > Thanks, > > Andrew > > > > -- > BR, > Hongtao -- BR, Hongtao
On Tue, Jun 01, 2021 at 04:32:42PM +0800, Hongtao Liu wrote: [ no attachment to reply to ] Please send this with either the patch actually inline, or as attachment with content-disposition inline, no encoding, and a valid text mimetype. So that people can see it, also on the archives, and actually reply to it! I'll see if I can fix it up this time. Segher
> PR target/100711 > * simplify-rtx.c (simplify_unary_operation_1): > Simplify (vec_duplicate (not op)) to (not (vec_duplicate op)). This is not a simplification. If we want to do this we need to document this canonicalisation (in md.texi, "Insn Canonicalizations"). > + /* Prefer (not (vec_duplicate (nonimmedaite_operand))) > + to (vec_duplicate (not (nonimmedaite_operand))). */ What Andrew said here (also, it's misspelled :-) ) > + case VEC_DUPLICATE: > + if (GET_CODE (op) == NOT) > + return gen_rtx_NOT (mode, gen_rtx_VEC_DUPLICATE (mode, XEXP (op, 0))); > + break; If it isn't a canonicalisation you need to simplify the result, and then only do it if it does in fact simplify. You risk "simplification" loops if you don't. Segher
This is the updated patch.
Please discard this one, sorry for disturbing. Obviously I'm new to git send-email. On Wed, Jun 2, 2021 at 1:40 PM liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > This is the updated patch. > >
From 2a70b50fe3ebe129a66d8e4d5c8c025cb6df6e4c Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Fri, 21 May 2021 11:12:49 +0800 Subject: [PATCH] [i386] Split not+broadcast+pand to broadcast+pandn. Split notl %edi vpbroadcastd %edi, %xmm0 vpand %xmm1, %xmm0, %xmm0 to vpbroadcastd %edi, %xmm0 vpandn %xmm1, %xmm0, %xmm0 gcc/ChangeLog: PR target/100711 * config/i386/sse.md (*andnot<mode>3): New combine splitter after it. gcc/testsuite/ChangeLog: PR target/100711 * gcc.target/i386/avx2-pr100711.c: New test. * gcc.target/i386/avx512bw-pr100711.c: New test. --- gcc/config/i386/sse.md | 20 +++++ gcc/testsuite/gcc.target/i386/avx2-pr100711.c | 73 +++++++++++++++++++ .../gcc.target/i386/avx512bw-pr100711.c | 48 ++++++++++++ 3 files changed, 141 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr100711.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100711.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a4503ddcb73..999c7322aac 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3990,6 +3990,26 @@ (define_insn "*andnot<mode>3" ] (const_string "<ssevecmode>")))]) +;; Split +;; notl %edi +;; vpbroadcastd %edi, %xmm0 +;; vpand %xmm1, %xmm0, %xmm0 +;;to +;; vpbroadcastd %edi, %xmm0 +;; vpandn %xmm1, %xmm0, %xmm0 + +(define_split + [(set (match_operand:VI 0 "register_operand") + (and:VI + (vec_duplicate:VI + (not:<ssescalarmode> + (match_operand:<ssescalarmode> 1 "register_operand"))) + (match_operand:VI 2 "bcst_vector_operand")))] + "TARGET_AVX2" + [(set (match_dup 3) (vec_duplicate:VI (match_dup 1))) + (set (match_dup 0) (and:VI (not:VI (match_dup 3)) (match_dup 2)))] + "operands[3] = gen_reg_rtx (<MODE>mode);") + (define_insn "*andnottf3" [(set (match_operand:TF 0 "register_operand" "=x,x,v,v") (and:TF diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr100711.c b/gcc/testsuite/gcc.target/i386/avx2-pr100711.c new file mode 100644 index 00000000000..5b144623873 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr100711.c @@ -0,0 +1,73 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O2" } */ +/* { dg-final { scan-assembler-times "pandn" 8 } } */ +/* { dg-final { scan-assembler-not "not\[bwlq\]" } } */ +typedef char v16qi __attribute__((vector_size(16))); +typedef char v32qi __attribute__((vector_size(32))); +typedef short v8hi __attribute__((vector_size(16))); +typedef short v16hi __attribute__((vector_size(32))); +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef long long v2di __attribute__((vector_size(16))); +typedef long long v4di __attribute__((vector_size(32))); + +v16qi +f1 (char a, v16qi c) +{ + char b = ~a; + return (__extension__(v16qi) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v32qi +f2 (char a, v32qi c) +{ + char b = ~a; + return (__extension__(v32qi) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v8hi +f3 (short a, v8hi c) +{ + short b = ~a; + return (__extension__(v8hi) {b, b, b, b, b, b, b, b}) & c; +} + +v16hi +f4 (short a, v16hi c) +{ + short b = ~a; + return (__extension__(v16hi) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v4si +f5 (int a, v4si c) +{ + int b = ~a; + return (__extension__(v4si) {b, b, b, b}) & c; +} + +v8si +f6 (int a, v8si c) +{ + int b = ~a; + return (__extension__(v8si) {b, b, b, b, b, b, b, b}) & c; +} + +v2di +f7 (long long a, v2di c) +{ + long long b = ~a; + return (__extension__(v2di) {b, b}) & c; +} + +v4di +f8 (long long a, v4di c) +{ + long long b = ~a; + return (__extension__(v4di) {b, b, b, b}) & c; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr100711.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr100711.c new file mode 100644 index 00000000000..f0a103d0bc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr100711.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O2" } */ +/* { dg-final { scan-assembler-times "pandn" 4 } } */ +/* { dg-final { scan-assembler-not "not\[bwlq\]" } } */ + +typedef char v64qi __attribute__((vector_size(64))); +typedef short v32hi __attribute__((vector_size(64))); +typedef int v16si __attribute__((vector_size(64))); +typedef long long v8di __attribute__((vector_size(64))); + +v64qi +f1 (char a, v64qi c) +{ + char b = ~a; + return (__extension__(v64qi) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v32hi +f2 (short a, v32hi c) +{ + short b = ~a; + return (__extension__(v32hi) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v16si +f3 (int a, v16si c) +{ + int b = ~a; + return (__extension__(v16si) {b, b, b, b, b, b, b, b, + b, b, b, b, b, b, b, b}) & c; +} + +v8di +f4 (long long a, v8di c) +{ + long long b = ~a; + return (__extension__(v8di) {b, b, b, b, b, b, b, b}) & c; +} -- 2.18.1