diff mbox series

Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.

Message ID 20210910043632.1087666-1-hongtao.liu@intel.com
State New
Headers show
Series Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode. | expand

Commit Message

liuhongt Sept. 10, 2021, 4:36 a.m. UTC
Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
optimizer wouldn't simplify if op0 has different mode with M, but that's too
restrict which will prevent below optimization, the condition can be relaxed
to op0 must have same inner mode with M.

(set (reg:V2DF 87 [ xx ])
    (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 2 [0x2])
                ]))
        (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 3 [0x3])
                ]))))

  Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

	* simplify-rtx.c
	(simplify_context::simplify_binary_operation_1): Relax
	condition of simplifying (vec_concat:M (vec_select op0
	index0)(vec_select op1 index1)) to allow different modes
	between op0 and M, but have same inner mode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-rebuild.c:
	* gcc.target/i386/avx512f-vect-rebuild.c: New test.
---
 gcc/simplify-rtx.c                            |  3 ++-
 .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
 3 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c

Comments

Jeff Law Sept. 13, 2021, 2:09 p.m. UTC | #1
On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
>    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
> optimizer wouldn't simplify if op0 has different mode with M, but that's too
> restrict which will prevent below optimization, the condition can be relaxed
> to op0 must have same inner mode with M.
>
> (set (reg:V2DF 87 [ xx ])
>      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
>              (parallel [
>                      (const_int 2 [0x2])
>                  ]))
>          (vec_select:DF (reg:V4DF 92)
>              (parallel [
>                      (const_int 3 [0x3])
>                  ]))))
>
>    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
>    Ok for trunk?
>
> gcc/ChangeLog:
>
> 	* simplify-rtx.c
> 	(simplify_context::simplify_binary_operation_1): Relax
> 	condition of simplifying (vec_concat:M (vec_select op0
> 	index0)(vec_select op1 index1)) to allow different modes
> 	between op0 and M, but have same inner mode.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/i386/vect-rebuild.c:
> 	* gcc.target/i386/avx512f-vect-rebuild.c: New test.
Funny, I was looking at something rather similar recently, but never 
pushed on it because we were going to need too many entries in the 
parallel selector.

I'm not convinced that we need the inner mode to match anything.  As 
long as the vec_concat's mode is twice the size of the vec_select modes 
and the vec_select mode is <= the mode of its operands ISTM this is 
fine.   We  might want the modes of the vec_select to match, but I don't 
think that's strictly necessary either, they just need to be the same 
size.  ie, we could have somethig like

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

I'm not sure if that level of generality is useful though.  If we want 
the modes of the vec_selects to match I think we could still support

(vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))

Thoughts?

jeff

Jeff


> ---
>   gcc/simplify-rtx.c                            |  3 ++-
>   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
>   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
>   3 files changed, 24 insertions(+), 2 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index ebad5cb5a79..16286befd79 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>   	if (GET_CODE (trueop0) == VEC_SELECT
>   	    && GET_CODE (trueop1) == VEC_SELECT
>   	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
> -	    && GET_MODE (XEXP (trueop0, 0)) == mode)
> +	    && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> +	       == GET_MODE_INNER(mode))
>   	  {
>   	    rtx par0 = XEXP (trueop0, 1);
>   	    rtx par1 = XEXP (trueop1, 1);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> new file mode 100644
> index 00000000000..aef6855aa46
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
> +
> +typedef double v2df __attribute__ ((__vector_size__ (16)));
> +typedef double v4df __attribute__ ((__vector_size__ (32)));
> +
> +v2df h (v4df x)
> +{
> +  v2df xx = { x[2], x[3] };
> +  return xx;
> +}
> +
> +v4df f2 (v4df x)
> +{
> +  v4df xx = { x[0], x[1], x[2], x[3] };
> +  return xx;
> +}
> +
> +/* { dg-final { scan-assembler-not "unpck" } } */
> +/* { dg-final { scan-assembler-not "valign" } } */
> +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> index 570967f6b5c..8e85b98bf1d 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> @@ -30,4 +30,4 @@ v2df h (v4df x)
>   
>   /* { dg-final { scan-assembler-not "unpck" } } */
>   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
> -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
Richard Biener Sept. 13, 2021, 2:24 p.m. UTC | #2
On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
> > optimizer wouldn't simplify if op0 has different mode with M, but that's too
> > restrict which will prevent below optimization, the condition can be relaxed
> > to op0 must have same inner mode with M.
> >
> > (set (reg:V2DF 87 [ xx ])
> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 2 [0x2])
> >                  ]))
> >          (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 3 [0x3])
> >                  ]))))
> >
> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
> >    Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >       * simplify-rtx.c
> >       (simplify_context::simplify_binary_operation_1): Relax
> >       condition of simplifying (vec_concat:M (vec_select op0
> >       index0)(vec_select op1 index1)) to allow different modes
> >       between op0 and M, but have same inner mode.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/vect-rebuild.c:
> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.
> Funny, I was looking at something rather similar recently, but never
> pushed on it because we were going to need too many entries in the
> parallel selector.
>
> I'm not convinced that we need the inner mode to match anything.  As
> long as the vec_concat's mode is twice the size of the vec_select modes
> and the vec_select mode is <= the mode of its operands ISTM this is
> fine.   We  might want the modes of the vec_select to match, but I don't
> think that's strictly necessary either, they just need to be the same
> size.  ie, we could have somethig like
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>
> I'm not sure if that level of generality is useful though.  If we want
> the modes of the vec_selects to match I think we could still support
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
>
> Thoughts?

I think the component or scalar modes of the elements to concat need to match
the component mode of the result.  I don't think you example involving
a cat of DF and DI is too useful - but you could use a subreg around the DI
value ;)

Richard.

>
> jeff
>
> Jeff
>
>
> > ---
> >   gcc/simplify-rtx.c                            |  3 ++-
> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
> >   3 files changed, 24 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index ebad5cb5a79..16286befd79 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >       if (GET_CODE (trueop0) == VEC_SELECT
> >           && GET_CODE (trueop1) == VEC_SELECT
> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)
> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> > +            == GET_MODE_INNER(mode))
> >         {
> >           rtx par0 = XEXP (trueop0, 1);
> >           rtx par1 = XEXP (trueop1, 1);
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > new file mode 100644
> > index 00000000000..aef6855aa46
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
> > +
> > +typedef double v2df __attribute__ ((__vector_size__ (16)));
> > +typedef double v4df __attribute__ ((__vector_size__ (32)));
> > +
> > +v2df h (v4df x)
> > +{
> > +  v2df xx = { x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +v4df f2 (v4df x)
> > +{
> > +  v4df xx = { x[0], x[1], x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "unpck" } } */
> > +/* { dg-final { scan-assembler-not "valign" } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > index 570967f6b5c..8e85b98bf1d 100644
> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > @@ -30,4 +30,4 @@ v2df h (v4df x)
> >
> >   /* { dg-final { scan-assembler-not "unpck" } } */
> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
>
Hongtao Liu Sept. 13, 2021, 3:09 p.m. UTC | #3
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
> > optimizer wouldn't simplify if op0 has different mode with M, but that's too
> > restrict which will prevent below optimization, the condition can be relaxed
> > to op0 must have same inner mode with M.
> >
> > (set (reg:V2DF 87 [ xx ])
> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 2 [0x2])
> >                  ]))
> >          (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 3 [0x3])
> >                  ]))))
> >
> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
> >    Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >       * simplify-rtx.c
> >       (simplify_context::simplify_binary_operation_1): Relax
> >       condition of simplifying (vec_concat:M (vec_select op0
> >       index0)(vec_select op1 index1)) to allow different modes
> >       between op0 and M, but have same inner mode.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/vect-rebuild.c:
> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.
> Funny, I was looking at something rather similar recently, but never
> pushed on it because we were going to need too many entries in the
> parallel selector.
>
> I'm not convinced that we need the inner mode to match anything.  As
> long as the vec_concat's mode is twice the size of the vec_select modes
> and the vec_select mode is <= the mode of its operands ISTM this is
> fine.   We  might want the modes of the vec_select to match, but I don't
> think that's strictly necessary either, they just need to be the same
> size.  ie, we could have somethig like
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>
> I'm not sure if that level of generality is useful though.  If we want
> the modes of the vec_selects to match I think we could still support
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
The first operand of vec_select is required to be the same here. and I
guess (vec_select:DF (subreg:V4DF (reg:V8DF) 0) will be simplified to
(vec_select:DF (reg:V8DF))?
>
> Thoughts?
>
> jeff
>
> Jeff
>
>
> > ---
> >   gcc/simplify-rtx.c                            |  3 ++-
> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
> >   3 files changed, 24 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index ebad5cb5a79..16286befd79 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >       if (GET_CODE (trueop0) == VEC_SELECT
> >           && GET_CODE (trueop1) == VEC_SELECT
> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)
> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> > +            == GET_MODE_INNER(mode))
> >         {
> >           rtx par0 = XEXP (trueop0, 1);
> >           rtx par1 = XEXP (trueop1, 1);
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > new file mode 100644
> > index 00000000000..aef6855aa46
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
> > +
> > +typedef double v2df __attribute__ ((__vector_size__ (16)));
> > +typedef double v4df __attribute__ ((__vector_size__ (32)));
> > +
> > +v2df h (v4df x)
> > +{
> > +  v2df xx = { x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +v4df f2 (v4df x)
> > +{
> > +  v4df xx = { x[0], x[1], x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "unpck" } } */
> > +/* { dg-final { scan-assembler-not "valign" } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > index 570967f6b5c..8e85b98bf1d 100644
> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > @@ -30,4 +30,4 @@ v2df h (v4df x)
> >
> >   /* { dg-final { scan-assembler-not "unpck" } } */
> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
>
Hongtao Liu Sept. 13, 2021, 3:19 p.m. UTC | #4
On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
> >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
> > optimizer wouldn't simplify if op0 has different mode with M, but that's too
> > restrict which will prevent below optimization, the condition can be relaxed
> > to op0 must have same inner mode with M.
> >
> > (set (reg:V2DF 87 [ xx ])
> >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 2 [0x2])
> >                  ]))
> >          (vec_select:DF (reg:V4DF 92)
> >              (parallel [
> >                      (const_int 3 [0x3])
> >                  ]))))
> >
> >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
> >    Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >       * simplify-rtx.c
> >       (simplify_context::simplify_binary_operation_1): Relax
> >       condition of simplifying (vec_concat:M (vec_select op0
> >       index0)(vec_select op1 index1)) to allow different modes
> >       between op0 and M, but have same inner mode.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/i386/vect-rebuild.c:
> >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.
> Funny, I was looking at something rather similar recently, but never
> pushed on it because we were going to need too many entries in the
> parallel selector.
>
> I'm not convinced that we need the inner mode to match anything.  As
> long as the vec_concat's mode is twice the size of the vec_select modes
> and the vec_select mode is <= the mode of its operands ISTM this is
> fine.   We  might want the modes of the vec_select to match, but I don't
> think that's strictly necessary either, they just need to be the same
> size.  ie, we could have somethig like
If they're different sizes, i.e, something like below should also be legal?
(vec_concat:V8SF (vec_select:V2SF (reg:V16SF)) (vec_select:V6SF (reg:V16SF)))
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>
> I'm not sure if that level of generality is useful though.  If we want
> the modes of the vec_selects to match I think we could still support
>
> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
>
> Thoughts?
>
> jeff
>
> Jeff
>
>
> > ---
> >   gcc/simplify-rtx.c                            |  3 ++-
> >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
> >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
> >   3 files changed, 24 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index ebad5cb5a79..16286befd79 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >       if (GET_CODE (trueop0) == VEC_SELECT
> >           && GET_CODE (trueop1) == VEC_SELECT
> >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
> > -         && GET_MODE (XEXP (trueop0, 0)) == mode)
> > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> > +            == GET_MODE_INNER(mode))
> >         {
> >           rtx par0 = XEXP (trueop0, 1);
> >           rtx par1 = XEXP (trueop1, 1);
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > new file mode 100644
> > index 00000000000..aef6855aa46
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
> > +
> > +typedef double v2df __attribute__ ((__vector_size__ (16)));
> > +typedef double v4df __attribute__ ((__vector_size__ (32)));
> > +
> > +v2df h (v4df x)
> > +{
> > +  v2df xx = { x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +v4df f2 (v4df x)
> > +{
> > +  v4df xx = { x[0], x[1], x[2], x[3] };
> > +  return xx;
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "unpck" } } */
> > +/* { dg-final { scan-assembler-not "valign" } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > index 570967f6b5c..8e85b98bf1d 100644
> > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > @@ -30,4 +30,4 @@ v2df h (v4df x)
> >
> >   /* { dg-final { scan-assembler-not "unpck" } } */
> >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
> > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
> > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
>
Hongtao Liu Sept. 24, 2021, 10:58 a.m. UTC | #5
ping

On Mon, Sep 13, 2021 at 11:19 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Sep 13, 2021 at 10:10 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 9/9/2021 10:36 PM, liuhongt via Gcc-patches wrote:
> > >    Currently for (vec_concat:M (vec_select op0 idx1)(vec_select op0 idx2)),
> > > optimizer wouldn't simplify if op0 has different mode with M, but that's too
> > > restrict which will prevent below optimization, the condition can be relaxed
> > > to op0 must have same inner mode with M.
> > >
> > > (set (reg:V2DF 87 [ xx ])
> > >      (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
> > >              (parallel [
> > >                      (const_int 2 [0x2])
> > >                  ]))
> > >          (vec_select:DF (reg:V4DF 92)
> > >              (parallel [
> > >                      (const_int 3 [0x3])
> > >                  ]))))
> > >
> > >    Bootsrapped and regtested on x86_64-linux-gnu{-m32,}.
> > >    Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >       * simplify-rtx.c
> > >       (simplify_context::simplify_binary_operation_1): Relax
> > >       condition of simplifying (vec_concat:M (vec_select op0
> > >       index0)(vec_select op1 index1)) to allow different modes
> > >       between op0 and M, but have same inner mode.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/i386/vect-rebuild.c:
> > >       * gcc.target/i386/avx512f-vect-rebuild.c: New test.
> > Funny, I was looking at something rather similar recently, but never
> > pushed on it because we were going to need too many entries in the
> > parallel selector.
> >
> > I'm not convinced that we need the inner mode to match anything.  As
> > long as the vec_concat's mode is twice the size of the vec_select modes
> > and the vec_select mode is <= the mode of its operands ISTM this is
> > fine.   We  might want the modes of the vec_select to match, but I don't
> > think that's strictly necessary either, they just need to be the same
> > size.  ie, we could have somethig like
> If they're different sizes, i.e, something like below should also be legal?
> (vec_concat:V8SF (vec_select:V2SF (reg:V16SF)) (vec_select:V6SF (reg:V16SF)))
> >
> > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> >
> > I'm not sure if that level of generality is useful though.  If we want
> > the modes of the vec_selects to match I think we could still support
> >
> > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> >
> > Thoughts?
> >
> > jeff
> >
> > Jeff
> >
> >
> > > ---
> > >   gcc/simplify-rtx.c                            |  3 ++-
> > >   .../gcc.target/i386/avx512f-vect-rebuild.c    | 21 +++++++++++++++++++
> > >   gcc/testsuite/gcc.target/i386/vect-rebuild.c  |  2 +-
> > >   3 files changed, 24 insertions(+), 2 deletions(-)
> > >   create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > >
> > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > > index ebad5cb5a79..16286befd79 100644
> > > --- a/gcc/simplify-rtx.c
> > > +++ b/gcc/simplify-rtx.c
> > > @@ -4587,7 +4587,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> > >       if (GET_CODE (trueop0) == VEC_SELECT
> > >           && GET_CODE (trueop1) == VEC_SELECT
> > >           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
> > > -         && GET_MODE (XEXP (trueop0, 0)) == mode)
> > > +         && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> > > +            == GET_MODE_INNER(mode))
> > >         {
> > >           rtx par0 = XEXP (trueop0, 1);
> > >           rtx par1 = XEXP (trueop1, 1);
> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > > new file mode 100644
> > > index 00000000000..aef6855aa46
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
> > > +
> > > +typedef double v2df __attribute__ ((__vector_size__ (16)));
> > > +typedef double v4df __attribute__ ((__vector_size__ (32)));
> > > +
> > > +v2df h (v4df x)
> > > +{
> > > +  v2df xx = { x[2], x[3] };
> > > +  return xx;
> > > +}
> > > +
> > > +v4df f2 (v4df x)
> > > +{
> > > +  v4df xx = { x[0], x[1], x[2], x[3] };
> > > +  return xx;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-not "unpck" } } */
> > > +/* { dg-final { scan-assembler-not "valign" } } */
> > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > > index 570967f6b5c..8e85b98bf1d 100644
> > > --- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
> > > @@ -30,4 +30,4 @@ v2df h (v4df x)
> > >
> > >   /* { dg-final { scan-assembler-not "unpck" } } */
> > >   /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
> > > -/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
> > > +/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
> >
>
>
> --
> BR,
> Hongtao
Segher Boessenkool Sept. 24, 2021, 1:06 p.m. UTC | #6
On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
> On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> > I'm not convinced that we need the inner mode to match anything.  As
> > long as the vec_concat's mode is twice the size of the vec_select modes
> > and the vec_select mode is <= the mode of its operands ISTM this is
> > fine.   We  might want the modes of the vec_select to match, but I don't
> > think that's strictly necessary either, they just need to be the same
> > size.  ie, we could have somethig like
> >
> > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> >
> > I'm not sure if that level of generality is useful though.  If we want
> > the modes of the vec_selects to match I think we could still support
> >
> > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> >
> > Thoughts?
> 
> I think the component or scalar modes of the elements to concat need to match
> the component mode of the result.  I don't think you example involving
> a cat of DF and DI is too useful - but you could use a subreg around the DI
> value ;)

I agree.

If you want to concatenate components of different modes, you should
change mode first, using subregs for example.

("Inner mode" is something of subregs btw, "component mode" is what this
concept of modes is called, the name GET_MODE_INNER is a bit confusing
though :-) )

Btw, the documentation for "concat" says
  @findex concat
  @item (concat@var{m} @var{rtx} @var{rtx})
  This RTX represents the concatenation of two other RTXs.  This is used
  for complex values.  It should only appear in the RTL attached to
  declarations and during RTL generation.  It should not appear in the
  ordinary insn chain.
which needs some updating (in many ways).


Segher
Hongtao Liu Sept. 27, 2021, 9:49 a.m. UTC | #7
On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
> > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > > I'm not convinced that we need the inner mode to match anything.  As
> > > long as the vec_concat's mode is twice the size of the vec_select modes
> > > and the vec_select mode is <= the mode of its operands ISTM this is
> > > fine.   We  might want the modes of the vec_select to match, but I don't
> > > think that's strictly necessary either, they just need to be the same
> > > size.  ie, we could have somethig like
> > >
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> > >
> > > I'm not sure if that level of generality is useful though.  If we want
> > > the modes of the vec_selects to match I think we could still support
> > >
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > >
> > > Thoughts?
> >
> > I think the component or scalar modes of the elements to concat need to match
> > the component mode of the result.  I don't think you example involving
> > a cat of DF and DI is too useful - but you could use a subreg around the DI
> > value ;)
>
> I agree.
>
> If you want to concatenate components of different modes, you should
> change mode first, using subregs for example.
I don't really understand.

for
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > >
> > > Thoughts?
how can it be simplified when reg:V4DF is different from (reg:V8DF)
to
(vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
   (paralle[...])
?, which doesn't look like a simpication.

Similar for
> > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))

here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
vec_concat (vec_select vec_select) can be simplified to just
vec_select.

>
> ("Inner mode" is something of subregs btw, "component mode" is what this
> concept of modes is called, the name GET_MODE_INNER is a bit confusing
> though :-) )
>
> Btw, the documentation for "concat" says
>   @findex concat
>   @item (concat@var{m} @var{rtx} @var{rtx})
>   This RTX represents the concatenation of two other RTXs.  This is used
>   for complex values.  It should only appear in the RTL attached to
>   declarations and during RTL generation.  It should not appear in the
>   ordinary insn chain.
> which needs some updating (in many ways).
>
>
> Segher
Richard Biener Sept. 27, 2021, 12:07 p.m. UTC | #8
On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
> > > On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > > I'm not convinced that we need the inner mode to match anything.  As
> > > > long as the vec_concat's mode is twice the size of the vec_select modes
> > > > and the vec_select mode is <= the mode of its operands ISTM this is
> > > > fine.   We  might want the modes of the vec_select to match, but I don't
> > > > think that's strictly necessary either, they just need to be the same
> > > > size.  ie, we could have somethig like
> > > >
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
> > > >
> > > > I'm not sure if that level of generality is useful though.  If we want
> > > > the modes of the vec_selects to match I think we could still support
> > > >
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > > >
> > > > Thoughts?
> > >
> > > I think the component or scalar modes of the elements to concat need to match
> > > the component mode of the result.  I don't think you example involving
> > > a cat of DF and DI is too useful - but you could use a subreg around the DI
> > > value ;)
> >
> > I agree.
> >
> > If you want to concatenate components of different modes, you should
> > change mode first, using subregs for example.
> I don't really understand.
>
> for
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
> > > >
> > > > Thoughts?
> how can it be simplified when reg:V4DF is different from (reg:V8DF)
> to
> (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
>    (paralle[...])
> ?, which doesn't look like a simpication.
>
> Similar for
> > > > (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>
> here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
> vec_concat (vec_select vec_select) can be simplified to just
> vec_select.

Yes, I think your patch is reasonable, I don't understand how we can
intermediately convert here either or how that would help.

Quoting the original example:

(set (reg:V2DF 87 [ xx ])
    (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 2 [0x2])
                ]))
        (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 3 [0x3])
                ]))))

it's not about vec_concat of different modes but vec_concat with V2DF
mode concated from vec_select of V4DF and V2DF and V4DF not matching
up.  But we can do (vec_select:V2DF (reg:V4DF ..) (parallel [... ])) just
fine which this case doesn't handle.

Richard.


>
> >
> > ("Inner mode" is something of subregs btw, "component mode" is what this
> > concept of modes is called, the name GET_MODE_INNER is a bit confusing
> > though :-) )
> >
> > Btw, the documentation for "concat" says
> >   @findex concat
> >   @item (concat@var{m} @var{rtx} @var{rtx})
> >   This RTX represents the concatenation of two other RTXs.  This is used
> >   for complex values.  It should only appear in the RTL attached to
> >   declarations and during RTL generation.  It should not appear in the
> >   ordinary insn chain.
> > which needs some updating (in many ways).
> >
> >
> > Segher
>
>
>
> --
> BR,
> Hongtao
Jeff Law Sept. 28, 2021, 1:48 a.m. UTC | #9
On 9/27/2021 6:07 AM, Richard Biener via Gcc-patches wrote:
> On Mon, Sep 27, 2021 at 11:42 AM Hongtao Liu <crazylht@gmail.com> wrote:
>> On Fri, Sep 24, 2021 at 9:08 PM Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>>> On Mon, Sep 13, 2021 at 04:24:13PM +0200, Richard Biener wrote:
>>>> On Mon, Sep 13, 2021 at 4:10 PM Jeff Law via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>> I'm not convinced that we need the inner mode to match anything.  As
>>>>> long as the vec_concat's mode is twice the size of the vec_select modes
>>>>> and the vec_select mode is <= the mode of its operands ISTM this is
>>>>> fine.   We  might want the modes of the vec_select to match, but I don't
>>>>> think that's strictly necessary either, they just need to be the same
>>>>> size.  ie, we could have somethig like
>>>>>
>>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>>>>>
>>>>> I'm not sure if that level of generality is useful though.  If we want
>>>>> the modes of the vec_selects to match I think we could still support
>>>>>
>>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
>>>>>
>>>>> Thoughts?
>>>> I think the component or scalar modes of the elements to concat need to match
>>>> the component mode of the result.  I don't think you example involving
>>>> a cat of DF and DI is too useful - but you could use a subreg around the DI
>>>> value ;)
>>> I agree.
>>>
>>> If you want to concatenate components of different modes, you should
>>> change mode first, using subregs for example.
>> I don't really understand.
>>
>> for
>>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DF (reg:V8DF)))
>>>>>
>>>>> Thoughts?
>> how can it be simplified when reg:V4DF is different from (reg:V8DF)
>> to
>> (vec_select: (vec_concat:(subreg V8DF (reg:V4DF) 0) (reg:V8DF))
>>     (paralle[...])
>> ?, which doesn't look like a simpication.
>>
>> Similar for
>>>>> (vec_concat:V2DF (vec_select:DF (reg:V4DF)) (vec_select:DI (reg:V4DI)))
>> here we require rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)) so
>> vec_concat (vec_select vec_select) can be simplified to just
>> vec_select.
> Yes, I think your patch is reasonable, I don't understand how we can
> intermediately convert here either or how that would help.
Agreed.  It was never my intention to hold things up.  I was just 
curious about whether or not we wanted to relax things even further.

Jeff
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb5a79..16286befd79 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4587,7 +4587,8 @@  simplify_context::simplify_binary_operation_1 (rtx_code code,
 	if (GET_CODE (trueop0) == VEC_SELECT
 	    && GET_CODE (trueop1) == VEC_SELECT
 	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0))
-	    && GET_MODE (XEXP (trueop0, 0)) == mode)
+	    && GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
+	       == GET_MODE_INNER(mode))
 	  {
 	    rtx par0 = XEXP (trueop0, 1);
 	    rtx par1 = XEXP (trueop1, 1);
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
new file mode 100644
index 00000000000..aef6855aa46
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-rebuild.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -mavx512vl -mavx512dq -fno-tree-forwprop" } */
+
+typedef double v2df __attribute__ ((__vector_size__ (16)));
+typedef double v4df __attribute__ ((__vector_size__ (32)));
+
+v2df h (v4df x)
+{
+  v2df xx = { x[2], x[3] };
+  return xx;
+}
+
+v4df f2 (v4df x)
+{
+  v4df xx = { x[0], x[1], x[2], x[3] };
+  return xx;
+}
+
+/* { dg-final { scan-assembler-not "unpck" } } */
+/* { dg-final { scan-assembler-not "valign" } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-rebuild.c b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
index 570967f6b5c..8e85b98bf1d 100644
--- a/gcc/testsuite/gcc.target/i386/vect-rebuild.c
+++ b/gcc/testsuite/gcc.target/i386/vect-rebuild.c
@@ -30,4 +30,4 @@  v2df h (v4df x)
 
 /* { dg-final { scan-assembler-not "unpck" } } */
 /* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */
-/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */
+/* { dg-final { scan-assembler-times "\tv?extract(?:f128|f64x2)\[ \t\]" 1 } } */