Message ID | 20210806070450.1168329-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element. | expand |
liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi: > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} > Ok for trunk? I think if anything the canonicalisation should be the other way: if the shift amount is an in-range constant, we know that it fits within a vector element, and so the vector form should be preferred. IMO it's a wart that we support (say): (ashift:V4SI (reg:V4SI R) (const_int 1)) even though: (plus:V4SI (reg:V4SI R) (const_int 1)) (minus:V4SI (const_int 1) (reg:V4SI R)) are not well-formed (at least AFAIK). Thanks, Richard > > gcc/ChangeLog: > > PR rtl-optimization/101796 > * simplify-rtx.c > (simplify_context::simplify_binary_operation_1): Simplify > vector shift/rotate with const_vec_duplicate to vector > shift/rotate with const_int element. > > gcc/testsuite/ChangeLog: > > PR rtl-optimization/101796 > * gcc.target/i386/pr101796.c: New test. > --- > gcc/simplify-rtx.c | 15 ++++++ > gcc/testsuite/gcc.target/i386/pr101796.c | 65 ++++++++++++++++++++++++ > 2 files changed, 80 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index a719f57870f..75f3e455562 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > return simplify_gen_binary (code, mode, op0, > gen_int_shift_amount (mode, val)); > } > + > + /* Optimize vector shift/rotate with const_vec_duplicate > + to vector shift/rotate with const_int element. > + /* TODO: vec_duplicate with variable can also be simplified, > + but GCC only require operand 2 of shift/rotate to be a scalar type > + which can have different modes in different backends, it makes > + simplication difficult to decide which mode should be choosed > + for shift/rotate count. */ > + if ((code == ASHIFTRT || code == LSHIFTRT > + || code == ASHIFT || code == ROTATERT > + || code == ROTATE) > + && const_vec_duplicate_p (op1)) > + return simplify_gen_binary (code, mode, op0, > + unwrap_const_vec_duplicate (op1)); > + > break; > > case ASHIFT: > diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c b/gcc/testsuite/gcc.target/i386/pr101796.c > new file mode 100644 > index 00000000000..c22d6267fe5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101796.c > @@ -0,0 +1,65 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -O2 " } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > +/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } } */ > +/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } } */ > +/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } } */ > +/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } } */ > +/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } } */ > + > +#include <immintrin.h> > + > +__m512i > +foo (__m512i a) > +{ > + return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo1 (__m512i a) > +{ > + return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo2 (__m512i a, long long b) > +{ > + return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3)); > +} > + > +__m512i > +foo3 (__m512i a) > +{ > + return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo4 (__m512i a) > +{ > + return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo5 (__m512i a, long long b) > +{ > + return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3)); > +} > + > +__m512i > +foo6 (__m512i a) > +{ > + return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3)); > +} > + > +__m512i > +foo7 (__m512i a) > +{ > + return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3)); > +} > + > +__m512i > +foo8 (__m512i a, long long b) > +{ > + return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3)); > +}
On Fri, Aug 06, 2021 at 11:55:52AM +0100, Richard Sandiford wrote: > liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > Hi: > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} > > Ok for trunk? > > I think if anything the canonicalisation should be the other way: > if the shift amount is an in-range constant, we know that it fits > within a vector element, and so the vector form should be preferred. Yeah. And the canonicalisation needs to be documented *first*, i.e. we have to agree on it first, *before* patches doing this to simplify-rtx are acceptable. We don't do design-by-fait-accompli. Any canonicalisation also has to fit in well with other canonicalisations, or we will be better off not having canonical forms. If it turns out there is no good canonical form, we will simply have to handle both forms (or more than two perhaps). This isn't the end of the world, we have to do that already. If we can simplify things with a canonical form, that is great; if that causes too much extra work instead, it is not so great. These things have to be thought about. Segher
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index a719f57870f..75f3e455562 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, return simplify_gen_binary (code, mode, op0, gen_int_shift_amount (mode, val)); } + + /* Optimize vector shift/rotate with const_vec_duplicate + to vector shift/rotate with const_int element. + /* TODO: vec_duplicate with variable can also be simplified, + but GCC only require operand 2 of shift/rotate to be a scalar type + which can have different modes in different backends, it makes + simplication difficult to decide which mode should be choosed + for shift/rotate count. */ + if ((code == ASHIFTRT || code == LSHIFTRT + || code == ASHIFT || code == ROTATERT + || code == ROTATE) + && const_vec_duplicate_p (op1)) + return simplify_gen_binary (code, mode, op0, + unwrap_const_vec_duplicate (op1)); + break; case ASHIFT: diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c b/gcc/testsuite/gcc.target/i386/pr101796.c new file mode 100644 index 00000000000..c22d6267fe5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101796.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O2 " } */ +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ +/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } } */ +/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } } */ +/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } } */ +/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } } */ +/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } } */ + +#include <immintrin.h> + +__m512i +foo (__m512i a) +{ + return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3)); +} + +__m512i +foo1 (__m512i a) +{ + return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3)); +} + +__m512i +foo2 (__m512i a, long long b) +{ + return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3)); +} + +__m512i +foo3 (__m512i a) +{ + return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3)); +} + +__m512i +foo4 (__m512i a) +{ + return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3)); +} + +__m512i +foo5 (__m512i a, long long b) +{ + return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3)); +} + +__m512i +foo6 (__m512i a) +{ + return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3)); +} + +__m512i +foo7 (__m512i a) +{ + return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3)); +} + +__m512i +foo8 (__m512i a, long long b) +{ + return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3)); +}