Message ID | 20210910043632.1087666-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode. | expand |
On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote: > Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)), > optimizer wouldn't simplify if op0 has different mode with M, but that's too > restrict which will prevent below optimization, the condition can be relaxed > to op0 must have same inner mode with M. > > (set (reg:V2DF 87 [ xx ]) > (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) > (parallel [ > (const_int 2 [0x2]) > ])) > (vec_select:DF (reg:V4DF 92) > (parallel [ > (const_int 3 [0x3]) > ])))) > > Bootsrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * simplify-rtx.c > (simplify_context::simplify_binary_operation_1): Relax > condition of simplifying (vec_concat:M (vec_select op0 > index0)(vec_select op1 index1)) to allow different modes > between op0 and M, but have same inner mode. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/vect-rebuild.c: > * gcc.target/i386/avx512f-vect-rebuild.c: New test. Funny, I was looking at something rather similar recently, but never pushed on it because we were going to need too many entries in the parallel selector. I'm not convinced that we need the inner mode to match anything. As long as the vec_concat's mode is twice the size of the vec_select modes and the vec_select mode is <= the mode of its operands ISTM this is fine. We might want the modes of the vec_select to match, but I don't think that's strictly necessary either, they just need to be the same size. ie, we could have somethig like (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) I'm not sure if that level of generality is useful though. If we want the modes of the vec_selects to match I think we could still support (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) Thoughts? jeff Jeff > --- > gcc/simplify-rtx.c | 3 ++- > .../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++ > gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +- > 3 files changed, 24 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index ebad5cb5a79..16286befd79 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > if (GET_CODE (trueop0) == VEC_SELECT > && GET_CODE (trueop1) == VEC_SELECT > && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) > - && GET_MODE (XEXP (trueop0, 0)) == mode) > + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > + == GET_MODE_INNER(mode)) > { > rtx par0 = XEXP (trueop0, 1); > rtx par1 = XEXP (trueop1, 1); > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > new file mode 100644 > index 00000000000..aef6855aa46 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ > + > +typedef double v2df __attribute__ ((__vector_size__ (16))); > +typedef double v4df __attribute__ ((__vector_size__ (32))); > + > +v2df h (v4df x) > +{ > + v2df xx = { x[2], x[3] }; > + return xx; > +} > + > +v4df f2 (v4df x) > +{ > + v4df xx = { x[0], x[1], x[2], x[3] }; > + return xx; > +} > + > +/* { dg-final { scan-assembler-not "unpck" } } */ > +/* { dg-final { scan-assembler-not "valign" } } */ > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > index 570967f6b5c..8e85b98bf1d 100644 > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > @@ -30,4 +30,4 @@ v2df h (v4df x) > > /* { dg-final { scan-assembler-not "unpck" } } */ > /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote: > > Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)), > > optimizer wouldn't simplify if op0 has different mode with M, but that's too > > restrict which will prevent below optimization, the condition can be relaxed > > to op0 must have same inner mode with M. > > > > (set (reg:V2DF 87 [ xx ]) > > (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 2 [0x2]) > > ])) > > (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 3 [0x3]) > > ])))) > > > > Bootsrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * simplify-rtx.c > > (simplify_context::simplify_binary_operation_1): Relax > > condition of simplifying (vec_concat:M (vec_select op0 > > index0)(vec_select op1 index1)) to allow different modes > > between op0 and M, but have same inner mode. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/vect-rebuild.c: > > * gcc.target/i386/avx512f-vect-rebuild.c: New test. > Funny, I was looking at something rather similar recently, but never > pushed on it because we were going to need too many entries in the > parallel selector. > > I'm not convinced that we need the inner mode to match anything. As > long as the vec_concat's mode is twice the size of the vec_select modes > and the vec_select mode is <= the mode of its operands ISTM this is > fine. We might want the modes of the vec_select to match, but I don't > think that's strictly necessary either, they just need to be the same > size. ie, we could have somethig like > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > I'm not sure if that level of generality is useful though. If we want > the modes of the vec_selects to match I think we could still support > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > Thoughts? I think the component or scalar modes of the elements to concat need to match the component mode of the result. I don't think you example involving a cat of DF and DI is too useful - but you could use a subreg around the DI value ;) Richard. > > jeff > > Jeff > > > > --- > > gcc/simplify-rtx.c | 3 ++- > > .../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +- > > 3 files changed, 24 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index ebad5cb5a79..16286befd79 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > > if (GET_CODE (trueop0) == VEC_SELECT > > && GET_CODE (trueop1) == VEC_SELECT > > && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) > > - && GET_MODE (XEXP (trueop0, 0)) == mode) > > + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > + == GET_MODE_INNER(mode)) > > { > > rtx par0 = XEXP (trueop0, 1); > > rtx par1 = XEXP (trueop1, 1); > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > new file mode 100644 > > index 00000000000..aef6855aa46 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ > > + > > +typedef double v2df __attribute__ ((__vector_size__ (16))); > > +typedef double v4df __attribute__ ((__vector_size__ (32))); > > + > > +v2df h (v4df x) > > +{ > > + v2df xx = { x[2], x[3] }; > > + return xx; > > +} > > + > > +v4df f2 (v4df x) > > +{ > > + v4df xx = { x[0], x[1], x[2], x[3] }; > > + return xx; > > +} > > + > > +/* { dg-final { scan-assembler-not "unpck" } } */ > > +/* { dg-final { scan-assembler-not "valign" } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > index 570967f6b5c..8e85b98bf1d 100644 > > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > @@ -30,4 +30,4 @@ v2df h (v4df x) > > > > /* { dg-final { scan-assembler-not "unpck" } } */ > > /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ > > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ >
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote: > > Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)), > > optimizer wouldn't simplify if op0 has different mode with M, but that's too > > restrict which will prevent below optimization, the condition can be relaxed > > to op0 must have same inner mode with M. > > > > (set (reg:V2DF 87 [ xx ]) > > (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 2 [0x2]) > > ])) > > (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 3 [0x3]) > > ])))) > > > > Bootsrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * simplify-rtx.c > > (simplify_context::simplify_binary_operation_1): Relax > > condition of simplifying (vec_concat:M (vec_select op0 > > index0)(vec_select op1 index1)) to allow different modes > > between op0 and M, but have same inner mode. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/vect-rebuild.c: > > * gcc.target/i386/avx512f-vect-rebuild.c: New test. > Funny, I was looking at something rather similar recently, but never > pushed on it because we were going to need too many entries in the > parallel selector. > > I'm not convinced that we need the inner mode to match anything. As > long as the vec_concat's mode is twice the size of the vec_select modes > and the vec_select mode is <= the mode of its operands ISTM this is > fine. We might want the modes of the vec_select to match, but I don't > think that's strictly necessary either, they just need to be the same > size. ie, we could have somethig like > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > I'm not sure if that level of generality is useful though. If we want > the modes of the vec_selects to match I think we could still support > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) The first operand of vec_select is required to be the same here. and I guess (vec_select:DF (subreg:V4DF (reg:V8DF) 0) will be simplified to (vec_select:DF (reg:V8DF))? > > Thoughts? > > jeff > > Jeff > > > > --- > > gcc/simplify-rtx.c | 3 ++- > > .../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +- > > 3 files changed, 24 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index ebad5cb5a79..16286befd79 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > > if (GET_CODE (trueop0) == VEC_SELECT > > && GET_CODE (trueop1) == VEC_SELECT > > && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) > > - && GET_MODE (XEXP (trueop0, 0)) == mode) > > + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > + == GET_MODE_INNER(mode)) > > { > > rtx par0 = XEXP (trueop0, 1); > > rtx par1 = XEXP (trueop1, 1); > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > new file mode 100644 > > index 00000000000..aef6855aa46 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ > > + > > +typedef double v2df __attribute__ ((__vector_size__ (16))); > > +typedef double v4df __attribute__ ((__vector_size__ (32))); > > + > > +v2df h (v4df x) > > +{ > > + v2df xx = { x[2], x[3] }; > > + return xx; > > +} > > + > > +v4df f2 (v4df x) > > +{ > > + v4df xx = { x[0], x[1], x[2], x[3] }; > > + return xx; > > +} > > + > > +/* { dg-final { scan-assembler-not "unpck" } } */ > > +/* { dg-final { scan-assembler-not "valign" } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > index 570967f6b5c..8e85b98bf1d 100644 > > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > @@ -30,4 +30,4 @@ v2df h (v4df x) > > > > /* { dg-final { scan-assembler-not "unpck" } } */ > > /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ > > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ >
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote: > > Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)), > > optimizer wouldn't simplify if op0 has different mode with M, but that's too > > restrict which will prevent below optimization, the condition can be relaxed > > to op0 must have same inner mode with M. > > > > (set (reg:V2DF 87 [ xx ]) > > (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 2 [0x2]) > > ])) > > (vec_select:DF (reg:V4DF 92) > > (parallel [ > > (const_int 3 [0x3]) > > ])))) > > > > Bootsrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * simplify-rtx.c > > (simplify_context::simplify_binary_operation_1): Relax > > condition of simplifying (vec_concat:M (vec_select op0 > > index0)(vec_select op1 index1)) to allow different modes > > between op0 and M, but have same inner mode. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/vect-rebuild.c: > > * gcc.target/i386/avx512f-vect-rebuild.c: New test. > Funny, I was looking at something rather similar recently, but never > pushed on it because we were going to need too many entries in the > parallel selector. > > I'm not convinced that we need the inner mode to match anything. As > long as the vec_concat's mode is twice the size of the vec_select modes > and the vec_select mode is <= the mode of its operands ISTM this is > fine. We might want the modes of the vec_select to match, but I don't > think that's strictly necessary either, they just need to be the same > size. ie, we could have somethig like If they're different sizes, i.e, something like below should also be legal? (vec_concat:V8SF (vec_select:V2SF (reg:V16SF)) (vec_select:V6SF (reg:V16SF))) > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > I'm not sure if that level of generality is useful though. If we want > the modes of the vec_selects to match I think we could still support > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > Thoughts? > > jeff > > Jeff > > > > --- > > gcc/simplify-rtx.c | 3 ++- > > .../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +- > > 3 files changed, 24 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index ebad5cb5a79..16286befd79 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > > if (GET_CODE (trueop0) == VEC_SELECT > > && GET_CODE (trueop1) == VEC_SELECT > > && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) > > - && GET_MODE (XEXP (trueop0, 0)) == mode) > > + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > + == GET_MODE_INNER(mode)) > > { > > rtx par0 = XEXP (trueop0, 1); > > rtx par1 = XEXP (trueop1, 1); > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > new file mode 100644 > > index 00000000000..aef6855aa46 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ > > + > > +typedef double v2df __attribute__ ((__vector_size__ (16))); > > +typedef double v4df __attribute__ ((__vector_size__ (32))); > > + > > +v2df h (v4df x) > > +{ > > + v2df xx = { x[2], x[3] }; > > + return xx; > > +} > > + > > +v4df f2 (v4df x) > > +{ > > + v4df xx = { x[0], x[1], x[2], x[3] }; > > + return xx; > > +} > > + > > +/* { dg-final { scan-assembler-not "unpck" } } */ > > +/* { dg-final { scan-assembler-not "valign" } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > index 570967f6b5c..8e85b98bf1d 100644 > > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > @@ -30,4 +30,4 @@ v2df h (v4df x) > > > > /* { dg-final { scan-assembler-not "unpck" } } */ > > /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ > > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ >
ping On Mon, Sep 13, 2021 at 11:19 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote: > > > Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)), > > > optimizer wouldn't simplify if op0 has different mode with M, but that's too > > > restrict which will prevent below optimization, the condition can be relaxed > > > to op0 must have same inner mode with M. > > > > > > (set (reg:V2DF 87 [ xx ]) > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) > > > (parallel [ > > > (const_int 2 [0x2]) > > > ])) > > > (vec_select:DF (reg:V4DF 92) > > > (parallel [ > > > (const_int 3 [0x3]) > > > ])))) > > > > > > Bootsrapped and regtested on x86_64-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > * simplify-rtx.c > > > (simplify_context::simplify_binary_operation_1): Relax > > > condition of simplifying (vec_concat:M (vec_select op0 > > > index0)(vec_select op1 index1)) to allow different modes > > > between op0 and M, but have same inner mode. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/vect-rebuild.c: > > > * gcc.target/i386/avx512f-vect-rebuild.c: New test. > > Funny, I was looking at something rather similar recently, but never > > pushed on it because we were going to need too many entries in the > > parallel selector. > > > > I'm not convinced that we need the inner mode to match anything. As > > long as the vec_concat's mode is twice the size of the vec_select modes > > and the vec_select mode is <= the mode of its operands ISTM this is > > fine. We might want the modes of the vec_select to match, but I don't > > think that's strictly necessary either, they just need to be the same > > size. ie, we could have somethig like > If they're different sizes, i.e, something like below should also be legal? > (vec_concat:V8SF (vec_select:V2SF (reg:V16SF)) (vec_select:V6SF (reg:V16SF))) > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > > > I'm not sure if that level of generality is useful though. If we want > > the modes of the vec_selects to match I think we could still support > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > Thoughts? > > > > jeff > > > > Jeff > > > > > > > --- > > > gcc/simplify-rtx.c | 3 ++- > > > .../gcc.target/i386/avx512f-vect-rebuild.c | 21 +++++++++++++++++++ > > > gcc/testsuite/gcc.target/i386/vect-rebuild.c | 2 +- > > > 3 files changed, 24 insertions(+), 2 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > > index ebad5cb5a79..16286befd79 100644 > > > --- a/gcc/simplify-rtx.c > > > +++ b/gcc/simplify-rtx.c > > > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, > > > if (GET_CODE (trueop0) == VEC_SELECT > > > && GET_CODE (trueop1) == VEC_SELECT > > > && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) > > > - && GET_MODE (XEXP (trueop0, 0)) == mode) > > > + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > > + == GET_MODE_INNER(mode)) > > > { > > > rtx par0 = XEXP (trueop0, 1); > > > rtx par1 = XEXP (trueop1, 1); > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > new file mode 100644 > > > index 00000000000..aef6855aa46 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c > > > @@ -0,0 +1,21 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ > > > + > > > +typedef double v2df __attribute__ ((__vector_size__ (16))); > > > +typedef double v4df __attribute__ ((__vector_size__ (32))); > > > + > > > +v2df h (v4df x) > > > +{ > > > + v2df xx = { x[2], x[3] }; > > > + return xx; > > > +} > > > + > > > +v4df f2 (v4df x) > > > +{ > > > + v4df xx = { x[0], x[1], x[2], x[3] }; > > > + return xx; > > > +} > > > + > > > +/* { dg-final { scan-assembler-not "unpck" } } */ > > > +/* { dg-final { scan-assembler-not "valign" } } */ > > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > > index 570967f6b5c..8e85b98bf1d 100644 > > > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c > > > @@ -30,4 +30,4 @@ v2df h (v4df x) > > > > > > /* { dg-final { scan-assembler-not "unpck" } } */ > > > /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ > > > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ > > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ > > > > > -- > BR, > Hongtao
On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote: > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > I'm not convinced that we need the inner mode to match anything. As > > long as the vec_concat's mode is twice the size of the vec_select modes > > and the vec_select mode is <= the mode of its operands ISTM this is > > fine. We might want the modes of the vec_select to match, but I don't > > think that's strictly necessary either, they just need to be the same > > size. ie, we could have somethig like > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > > > I'm not sure if that level of generality is useful though. If we want > > the modes of the vec_selects to match I think we could still support > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > Thoughts? > > I think the component or scalar modes of the elements to concat need to match > the component mode of the result. I don't think you example involving > a cat of DF and DI is too useful - but you could use a subreg around the DI > value ;) I agree. If you want to concatenate components of different modes, you should change mode first, using subregs for example. ("Inner mode" is something of subregs btw, "component mode" is what this concept of modes is called, the name GET_MODE_INNER is a bit confusing though :-) ) Btw, the documentation for "concat" says @findex concat @item (concat@var{m} @var{rtx} @var{rtx}) This RTX represents the concatenation of two other RTXs. This is used for complex values. It should only appear in the RTL attached to declarations and during RTL generation. It should not appear in the ordinary insn chain. which needs some updating (in many ways). Segher
On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote: > > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > I'm not convinced that we need the inner mode to match anything. As > > > long as the vec_concat's mode is twice the size of the vec_select modes > > > and the vec_select mode is <= the mode of its operands ISTM this is > > > fine. We might want the modes of the vec_select to match, but I don't > > > think that's strictly necessary either, they just need to be the same > > > size. ie, we could have somethig like > > > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > > > > > I'm not sure if that level of generality is useful though. If we want > > > the modes of the vec_selects to match I think we could still support > > > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > > > Thoughts? > > > > I think the component or scalar modes of the elements to concat need to match > > the component mode of the result. I don't think you example involving > > a cat of DF and DI is too useful - but you could use a subreg around the DI > > value ;) > > I agree. > > If you want to concatenate components of different modes, you should > change mode first, using subregs for example. I don't really understand. for > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > > > Thoughts? how can it be simplified when reg:V4DF is different from (reg:V8DF) to (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF)) (paralle[...]) ?, which doesn't look like a simpication. Similar for > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so vec_concat (vec_select vec_select) can be simplified to just vec_select. > > ("Inner mode" is something of subregs btw, "component mode" is what this > concept of modes is called, the name GET_MODE_INNER is a bit confusing > though :-) ) > > Btw, the documentation for "concat" says > @findex concat > @item (concat@var{m} @var{rtx} @var{rtx}) > This RTX represents the concatenation of two other RTXs. This is used > for complex values. It should only appear in the RTL attached to > declarations and during RTL generation. It should not appear in the > ordinary insn chain. > which needs some updating (in many ways). > > > Segher
On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote: > > > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > I'm not convinced that we need the inner mode to match anything. As > > > > long as the vec_concat's mode is twice the size of the vec_select modes > > > > and the vec_select mode is <= the mode of its operands ISTM this is > > > > fine. We might want the modes of the vec_select to match, but I don't > > > > think that's strictly necessary either, they just need to be the same > > > > size. ie, we could have somethig like > > > > > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > > > > > > > I'm not sure if that level of generality is useful though. If we want > > > > the modes of the vec_selects to match I think we could still support > > > > > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > > > > > Thoughts? > > > > > > I think the component or scalar modes of the elements to concat need to match > > > the component mode of the result. I don't think you example involving > > > a cat of DF and DI is too useful - but you could use a subreg around the DI > > > value ;) > > > > I agree. > > > > If you want to concatenate components of different modes, you should > > change mode first, using subregs for example. > I don't really understand. > > for > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) > > > > > > > > Thoughts? > how can it be simplified when reg:V4DF is different from (reg:V8DF) > to > (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF)) > (paralle[...]) > ?, which doesn't look like a simpication. > > Similar for > > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) > > here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so > vec_concat (vec_select vec_select) can be simplified to just > vec_select. Yes, I think your patch is reasonable, I don't understand how we can intermediately convert here either or how that would help. Quoting the original example: (set (reg:V2DF 87 [ xx ]) (vec_concat:V2DF (vec_select:DF (reg:V4DF 92) (parallel [ (const_int 2 [0x2]) ])) (vec_select:DF (reg:V4DF 92) (parallel [ (const_int 3 [0x3]) ])))) it's not about vec_concat of different modes but vec_concat with V2DF mode concated from vec_select of V4DF and V2DF and V4DF not matching up. But we can do (vec_select:V2DF (reg:V4DF ..) (parallel [... ])) just fine which this case doesn't handle. Richard. > > > > > ("Inner mode" is something of subregs btw, "component mode" is what this > > concept of modes is called, the name GET_MODE_INNER is a bit confusing > > though :-) ) > > > > Btw, the documentation for "concat" says > > @findex concat > > @item (concat@var{m} @var{rtx} @var{rtx}) > > This RTX represents the concatenation of two other RTXs. This is used > > for complex values. It should only appear in the RTL attached to > > declarations and during RTL generation. It should not appear in the > > ordinary insn chain. > > which needs some updating (in many ways). > > > > > > Segher > > > > -- > BR, > Hongtao
On 9/27/2021 6:07 AM, Richard Biener via Gcc-patches wrote: > On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu <crazylht@gmail.com> wrote: >> On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool >> <segher@kernel.crashing.org> wrote: >>> On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote: >>>> On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches >>>> <gcc-patches@gcc.gnu.org> wrote: >>>>> I'm not convinced that we need the inner mode to match anything. As >>>>> long as the vec_concat's mode is twice the size of the vec_select modes >>>>> and the vec_select mode is <= the mode of its operands ISTM this is >>>>> fine. We might want the modes of the vec_select to match, but I don't >>>>> think that's strictly necessary either, they just need to be the same >>>>> size. ie, we could have somethig like >>>>> >>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) >>>>> >>>>> I'm not sure if that level of generality is useful though. If we want >>>>> the modes of the vec_selects to match I think we could still support >>>>> >>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) >>>>> >>>>> Thoughts? >>>> I think the component or scalar modes of the elements to concat need to match >>>> the component mode of the result. I don't think you example involving >>>> a cat of DF and DI is too useful - but you could use a subreg around the DI >>>> value ;) >>> I agree. >>> >>> If you want to concatenate components of different modes, you should >>> change mode first, using subregs for example. >> I don't really understand. >> >> for >>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF))) >>>>> >>>>> Thoughts? >> how can it be simplified when reg:V4DF is different from (reg:V8DF) >> to >> (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF)) >> (paralle[...]) >> ?, which doesn't look like a simpication. >> >> Similar for >>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI))) >> here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so >> vec_concat (vec_select vec_select) can be simplified to just >> vec_select. > Yes, I think your patch is reasonable, I don't understand how we can > intermediately convert here either or how that would help. Agreed. It was never my intention to hold things up. I was just curious about whether or not we wanted to relax things even further. Jeff
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index ebad5cb5a79..16286befd79 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, if (GET_CODE (trueop0) == VEC_SELECT && GET_CODE (trueop1) == VEC_SELECT && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) - && GET_MODE (XEXP (trueop0, 0)) == mode) + && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) + == GET_MODE_INNER(mode)) { rtx par0 = XEXP (trueop0, 1); rtx par1 = XEXP (trueop1, 1); diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c new file mode 100644 index 00000000000..aef6855aa46 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */ + +typedef double v2df __attribute__ ((__vector_size__ (16))); +typedef double v4df __attribute__ ((__vector_size__ (32))); + +v2df h (v4df x) +{ + v2df xx = { x[2], x[3] }; + return xx; +} + +v4df f2 (v4df x) +{ + v4df xx = { x[0], x[1], x[2], x[3] }; + return xx; +} + +/* { dg-final { scan-assembler-not "unpck" } } */ +/* { dg-final { scan-assembler-not "valign" } } */ +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c index 570967f6b5c..8e85b98bf1d 100644 --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c @@ -30,4 +30,4 @@ v2df h (v4df x) /* { dg-final { scan-assembler-not "unpck" } } */ /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */ -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */ +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */