diff mbox series

[PR,rtl-optimization/97249] Simplify vec_select of paradoxical subreg.

Message ID CAMZc-byW9O2vwwozyNueO+7YPPdh54B54ne3fd+_9MnDEcfz0w@mail.gmail.com
State New
Headers show
Series [PR,rtl-optimization/97249] Simplify vec_select of paradoxical subreg. | expand

Commit Message

Hongtao Liu Oct. 13, 2020, 8:40 a.m. UTC
Hi:
  For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
                   (parallel [(const_int 0) (const_int 1)]))
 it could be simplified as inner.

  Bootstrap is ok, regression test on i386 backend is ok.

gcc/ChangeLog
        PR rtl-optimization/97249
        * simplify-rtx.c (simplify_binary_operation_1): Simplify
        vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

        * gcc.target/i386/pr97249-1.c: New test.

Comments

Segher Boessenkool Oct. 13, 2020, 7:59 p.m. UTC | #1
Hi!

On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>   For rtx like
>   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>                    (parallel [(const_int 0) (const_int 1)]))
>  it could be simplified as inner.

You could even simplify any vec_select of a subreg of X to just a
vec_select of X, by changing the selection vector a bit (well, only do
this if that is a constant vector, I suppose).  Not just for paradoxical
subregs either, just for *all* subregs.

> gcc/ChangeLog
>         PR rtl-optimization/97249
>         * simplify-rtx.c (simplify_binary_operation_1): Simplify
>         vec_select of paradoxical subreg.
> 
> gcc/testsuite/ChangeLog
> 
>         * gcc.target/i386/pr97249-1.c: New test.

> +	  /* For cases like
> +	     (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> +			      (parallel [(const_int 0) (const_int 1)])).
> +	     return inner directly.  */
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && paradoxical_subreg_p (trueop0)
> +	      && mode == GET_MODE (XEXP (trueop0, 0))
> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && l0 % l1 == 0)

Why this?  Why does the number of elements of the input have to divide
that of the output?

> +	    {
> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +	      unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> +	      unsigned HOST_WIDE_INT sel = 0;
> +	      int i = 0;
> +	      for (;i != l1; i++)

  for (int i = 0; i != l1; i++)

> +		{
> +		  rtx j = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (j))
> +		    break;
> +		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> +		}
> +	      /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> +	      if (sel == expect)
> +		return XEXP (trueop0, 0);
> +	    }
>  	}

If you just handle the much more generic case, all the other vec_select
simplifications can be done as well, not just this one.

> +/* PR target/97249  */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */

I don't know enough about the x86 backend to know if this is exactly
what you need in the testsuite.  I do know a case of backslashitis when
I see one though -- you might want to use {} instead of "", and perhaps
\m and \M and \s etc.  And to make sure things are on one line, don't do
all that nastiness with [^\n], just start the RE with (?n) :-)


Segher
Hongtao Liu Oct. 14, 2020, 5:43 a.m. UTC | #2
On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> >   For rtx like
> >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> >                    (parallel [(const_int 0) (const_int 1)]))
> >  it could be simplified as inner.
>
> You could even simplify any vec_select of a subreg of X to just a
> vec_select of X, by changing the selection vector a bit (well, only do

Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.

> this if that is a constant vector, I suppose).  Not just for paradoxical
> subregs either, just for *all* subregs.
>

Yes, and only when X has the same inner mode and more elements.

> > gcc/ChangeLog
> >         PR rtl-optimization/97249
> >         * simplify-rtx.c (simplify_binary_operation_1): Simplify
> >         vec_select of paradoxical subreg.
> >
> > gcc/testsuite/ChangeLog
> >
> >         * gcc.target/i386/pr97249-1.c: New test.
>
> > +       /* For cases like
> > +          (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > +                           (parallel [(const_int 0) (const_int 1)])).
> > +          return inner directly.  */
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && paradoxical_subreg_p (trueop0)
> > +           && mode == GET_MODE (XEXP (trueop0, 0))
> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && l0 % l1 == 0)
>
> Why this?  Why does the number of elements of the input have to divide
> that of the output?
>

Removed, also add condition for my upper comments.

> > +         {
> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> > +           unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> > +           unsigned HOST_WIDE_INT sel = 0;
> > +           int i = 0;
> > +           for (;i != l1; i++)
>
>   for (int i = 0; i != l1; i++)
>
> > +             {
> > +               rtx j = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (j))
> > +                 break;
> > +               sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> > +             }
> > +           /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> > +           if (sel == expect)
> > +             return XEXP (trueop0, 0);
> > +         }
> >       }
>
> If you just handle the much more generic case, all the other vec_select
> simplifications can be done as well, not just this one.
>

Yes, changed, also selection should be inside the elements of X.

> > +/* PR target/97249  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -O3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
>
> I don't know enough about the x86 backend to know if this is exactly
> what you need in the testsuite.  I do know a case of backslashitis when
> I see one though -- you might want to use {} instead of "", and perhaps
> \m and \M and \s etc.  And to make sure things are on one line, don't do
> all that nastiness with [^\n], just start the RE with (?n) :-)
>

Yes, changed and it's very clean with usage of (?n) and {}.

>
> Segher

Update patch.
Segher Boessenkool Oct. 14, 2020, 5:35 p.m. UTC | #3
Hi!

On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > >   For rtx like
> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > >                    (parallel [(const_int 0) (const_int 1)]))
> > >  it could be simplified as inner.
> >
> > You could even simplify any vec_select of a subreg of X to just a
> > vec_select of X, by changing the selection vector a bit (well, only do
> 
> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.

Exactly.

> > this if that is a constant vector, I suppose).  Not just for paradoxical
> > subregs either, just for *all* subregs.
> 
> Yes, and only when X has the same inner mode and more elements.

No, for *all*.  The mode of the first argument of vec_select does not
have to equal its result mode.

Any (constant indices) vec_select of a subreg can be written as just a
vec_select.

> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when available.  */

What does "when available" mean here?

> +	  int l2;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && (GET_MODE_INNER (mode)
> +		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))

Don't use unnecessary parens here please, it makes it harder to read
(there are quite enough parens already :-) )

> +	      gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
> +					   &subreg_offset));

Why is this needed?

> +	      bool success = true;
> +	      for (int i = 0;i != l1; i++)

(space after ; )

> +		{
> +		  rtx j = XVECEXP (trueop1, 0, i);

(i and j and k ususally are integers, not rtx)

> +		  if (!CONST_INT_P (j)
> +		      || known_ge (UINTVAL (j), l2 - subreg_offset))
> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}

You don't have to test if the input RTL is valid.  You can assume it is.

> +	      if (success)
> +		{
> +		  rtx par = trueop1;
> +		  if (subreg_offset)
> +		    {
> +		      rtvec vec = rtvec_alloc (l1);
> +		      for (int i = 0; i < l1; i++)
> +			RTVEC_ELT (vec, i)
> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> +					     + subreg_offset));
> +		      par = gen_rtx_PARALLEL (VOIDmode, vec);
> +		    }
> +		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> +		}
> +	    }

subreg_offset will differ in meaning if big-endian; is this correct
there, do all the stars align so this code works out fine there as well?

Looks fine otherwise, thanks :-)


Segher
Richard Biener Oct. 14, 2020, 5:55 p.m. UTC | #4
On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>Hi!
>
>On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
>> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>> > >   For rtx like
>> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>> > >                    (parallel [(const_int 0) (const_int 1)]))
>> > >  it could be simplified as inner.
>> >
>> > You could even simplify any vec_select of a subreg of X to just a
>> > vec_select of X, by changing the selection vector a bit (well, only
>do
>> 
>> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to
>selection.
>
>Exactly.
>
>> > this if that is a constant vector, I suppose).  Not just for
>paradoxical
>> > subregs either, just for *all* subregs.
>> 
>> Yes, and only when X has the same inner mode and more elements.
>
>No, for *all*.  The mode of the first argument of vec_select does not
>have to equal its result mode.

But IIRC the component mode needs to match. 

>Any (constant indices) vec_select of a subreg can be written as just a
>vec_select.
>
>> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
>> +	     when available.  */
>
>What does "when available" mean here?
>
>> +	  int l2;
>> +	  if (GET_CODE (trueop0) == SUBREG
>> +	      && (GET_MODE_INNER (mode)
>> +		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
>
>Don't use unnecessary parens here please, it makes it harder to read
>(there are quite enough parens already :-) )
>
>> +	      gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
>> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
>> +					   &subreg_offset));
>
>Why is this needed?
>
>> +	      bool success = true;
>> +	      for (int i = 0;i != l1; i++)
>
>(space after ; )
>
>> +		{
>> +		  rtx j = XVECEXP (trueop1, 0, i);
>
>(i and j and k ususally are integers, not rtx)
>
>> +		  if (!CONST_INT_P (j)
>> +		      || known_ge (UINTVAL (j), l2 - subreg_offset))
>> +		    {
>> +		      success = false;
>> +		      break;
>> +		    }
>> +		}
>
>You don't have to test if the input RTL is valid.  You can assume it
>is.
>
>> +	      if (success)
>> +		{
>> +		  rtx par = trueop1;
>> +		  if (subreg_offset)
>> +		    {
>> +		      rtvec vec = rtvec_alloc (l1);
>> +		      for (int i = 0; i < l1; i++)
>> +			RTVEC_ELT (vec, i)
>> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
>> +					     + subreg_offset));
>> +		      par = gen_rtx_PARALLEL (VOIDmode, vec);
>> +		    }
>> +		  return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
>> +		}
>> +	    }
>
>subreg_offset will differ in meaning if big-endian; is this correct
>there, do all the stars align so this code works out fine there as
>well?
>
>Looks fine otherwise, thanks :-)
>
>
>Segher
Segher Boessenkool Oct. 14, 2020, 7:23 p.m. UTC | #5
On Wed, Oct 14, 2020 at 07:55:55PM +0200, Richard Biener wrote:
> On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> >> <segher@kernel.crashing.org> wrote:
> >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> >> > >   For rtx like
> >> > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> >> > >                    (parallel [(const_int 0) (const_int 1)]))
> >> > >  it could be simplified as inner.
> >> >
> >> > You could even simplify any vec_select of a subreg of X to just a
> >> > vec_select of X, by changing the selection vector a bit (well, only
> >do
> >> 
> >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to
> >selection.
> >
> >Exactly.
> >
> >> > this if that is a constant vector, I suppose).  Not just for
> >paradoxical
> >> > subregs either, just for *all* subregs.
> >> 
> >> Yes, and only when X has the same inner mode and more elements.
> >
> >No, for *all*.  The mode of the first argument of vec_select does not
> >have to equal its result mode.
> 
> But IIRC the component mode needs to match. 

Yeah, good point, at least the i386 backend uses crazy subregs, which
is why validate_subreg does not test this :-(


Segher
Hongtao Liu Oct. 15, 2020, 8:14 a.m. UTC | #6
On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > > >   For rtx like
> > > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > > >                    (parallel [(const_int 0) (const_int 1)]))
> > > >  it could be simplified as inner.
> > >
> > > You could even simplify any vec_select of a subreg of X to just a
> > > vec_select of X, by changing the selection vector a bit (well, only do
> >
> > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.
>
> Exactly.
>
> > > this if that is a constant vector, I suppose).  Not just for paradoxical
> > > subregs either, just for *all* subregs.
> >
> > Yes, and only when X has the same inner mode and more elements.
>
> No, for *all*.  The mode of the first argument of vec_select does not
> have to equal its result mode.
>
> Any (constant indices) vec_select of a subreg can be written as just a
> vec_select.
>
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when available.  */
>
> What does "when available" mean here?
>

When X has same component mode as vec_select, i'll add this to comments.

> > +       int l2;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && (GET_MODE_INNER (mode)
> > +               == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
>
> Don't use unnecessary parens here please, it makes it harder to read
> (there are quite enough parens already :-) )
>

Yes.

> > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > +                                        &subreg_offset));
>
> Why is this needed?

I only found this interface for poly_uint64 division to get subreg_offset.

>
> > +           bool success = true;
> > +           for (int i = 0;i != l1; i++)
>
> (space after ; )
>
> > +             {
> > +               rtx j = XVECEXP (trueop1, 0, i);
>
> (i and j and k ususally are integers, not rtx)
>
> > +               if (!CONST_INT_P (j)
> > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
>
> You don't have to test if the input RTL is valid.  You can assume it is.
>

This test is for something like (vec_select:v2di (subreg:v4di
(reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?

> > +           if (success)
> > +             {
> > +               rtx par = trueop1;
> > +               if (subreg_offset)
> > +                 {
> > +                   rtvec vec = rtvec_alloc (l1);
> > +                   for (int i = 0; i < l1; i++)
> > +                     RTVEC_ELT (vec, i)
> > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > +                                          + subreg_offset));
> > +                   par = gen_rtx_PARALLEL (VOIDmode, vec);
> > +                 }
> > +               return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> > +             }
> > +         }
>
> subreg_offset will differ in meaning if big-endian; is this correct
Yes.
> there, do all the stars align so this code works out fine there as well?
>

i found it's a bit tricky to adjust selection index for target
BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
Especially for component mode smaller than word, Any interface to handle this?

> Looks fine otherwise, thanks :-)
>
>
> Segher
Hongtao Liu Oct. 15, 2020, 9:58 a.m. UTC | #7
On Thu, Oct 15, 2020 at 4:14 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > Hi!
> >
> > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote:
> > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
> > > <segher@kernel.crashing.org> wrote:
> > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> > > > >   For rtx like
> > > > >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > > > >                    (parallel [(const_int 0) (const_int 1)]))
> > > > >  it could be simplified as inner.
> > > >
> > > > You could even simplify any vec_select of a subreg of X to just a
> > > > vec_select of X, by changing the selection vector a bit (well, only do
> > >
> > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.
> >
> > Exactly.
> >
> > > > this if that is a constant vector, I suppose).  Not just for paradoxical
> > > > subregs either, just for *all* subregs.
> > >
> > > Yes, and only when X has the same inner mode and more elements.
> >
> > No, for *all*.  The mode of the first argument of vec_select does not
> > have to equal its result mode.
> >
> > Any (constant indices) vec_select of a subreg can be written as just a
> > vec_select.
> >
> > > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > > +          when available.  */
> >
> > What does "when available" mean here?
> >
>
> When X has same component mode as vec_select, i'll add this to comments.
>
> > > +       int l2;
> > > +       if (GET_CODE (trueop0) == SUBREG
> > > +           && (GET_MODE_INNER (mode)
> > > +               == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))))
> >
> > Don't use unnecessary parens here please, it makes it harder to read
> > (there are quite enough parens already :-) )
> >
>
> Yes.
>
> > > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > > +                                        &subreg_offset));
> >
> > Why is this needed?
>
> I only found this interface for poly_uint64 division to get subreg_offset.
>
> >
> > > +           bool success = true;
> > > +           for (int i = 0;i != l1; i++)
> >
> > (space after ; )
> >
> > > +             {
> > > +               rtx j = XVECEXP (trueop1, 0, i);
> >
> > (i and j and k ususally are integers, not rtx)
> >
> > > +               if (!CONST_INT_P (j)
> > > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > > +                 {
> > > +                   success = false;
> > > +                   break;
> > > +                 }
> > > +             }
> >
> > You don't have to test if the input RTL is valid.  You can assume it is.
> >
>
> This test is for something like (vec_select:v2di (subreg:v4di
> (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
> const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?
>

Remove the test.

> > > +           if (success)
> > > +             {
> > > +               rtx par = trueop1;
> > > +               if (subreg_offset)
> > > +                 {
> > > +                   rtvec vec = rtvec_alloc (l1);
> > > +                   for (int i = 0; i < l1; i++)
> > > +                     RTVEC_ELT (vec, i)
> > > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > > +                                          + subreg_offset));
> > > +                   par = gen_rtx_PARALLEL (VOIDmode, vec);
> > > +                 }
> > > +               return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par);
> > > +             }
> > > +         }
> >
> > subreg_offset will differ in meaning if big-endian; is this correct
> Yes.
> > there, do all the stars align so this code works out fine there as well?
> >
>
> i found it's a bit tricky to adjust selection index for target
> BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
> Especially for component mode smaller than word, Any interface to handle this?
>

Use subreg_lsb to get offset from least significant bits, suppose this
could be independent of big/little-endian.

> > Looks fine otherwise, thanks :-)
> >
> >
> > Segher
>
>
>
> --
> BR,
> Hongtao

Update Patch.
Richard Sandiford Oct. 15, 2020, 12:38 p.m. UTC | #8
Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  int l2;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))

Better to use SUBREG_REG here and below.

> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> +		  .is_constant (&l2)
> +	      && known_le (l1, l2))
> +	    {
> +	      unsigned HOST_WIDE_INT subreg_offset = 0;
> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +	      gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> +					   GET_MODE_SIZE (GET_MODE_INNER (mode)),
> +					   &subreg_offset));

can_div_trunc_p discards the remainder, whereas it looks like here
you want an exact multiple.

I don't think it's absolutely guaranteed that the “if” condition makes
the division by GET_MODE_SIZE exact.  E.g. in principle you could have
a subreg of a vector of TIs in which the subreg offset is misaligned by
a DI offset.

I'm not sure the subreg_lsb conversion is correct though.  On big-endian
targets, lane numbering follows memory layout, just like subreg byte
offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
was correct.

In summary, I think the "if” condition should include something like:

  constant_mulitple_p (SUBREG_BYTE (trueop0),
                       GET_MODE_UNIT_BITSIZE (mode),
                       &subreg_offset)

Thanks,
Richard
Hongtao Liu Oct. 19, 2020, 5:18 a.m. UTC | #9
On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       int l2;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
>
> Better to use SUBREG_REG here and below.
>

Yes and changed.

> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> > +               .is_constant (&l2)
> > +           && known_le (l1, l2))
> > +         {
> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > +                                        &subreg_offset));
>
> can_div_trunc_p discards the remainder, whereas it looks like here
> you want an exact multiple.
>
> I don't think it's absolutely guaranteed that the “if” condition makes
> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
> a subreg of a vector of TIs in which the subreg offset is misaligned by
> a DI offset.
>
> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
> targets, lane numbering follows memory layout, just like subreg byte
> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
> was correct.
>
> In summary, I think the "if” condition should include something like:
>
>   constant_mulitple_p (SUBREG_BYTE (trueop0),
>                        GET_MODE_UNIT_BITSIZE (mode),
>                        &subreg_offset)
>

Changed.

> Thanks,
> Richard


Update patch.
Richard Sandiford Oct. 19, 2020, 3:31 p.m. UTC | #10
Hongtao Liu <crazylht@gmail.com> writes:
> On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
>> > +          when X has same component mode as vec_select.  */
>> > +       int l2;
>> > +       if (GET_CODE (trueop0) == SUBREG
>> > +           && GET_MODE_INNER (mode)
>> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
>>
>> Better to use SUBREG_REG here and below.
>>
>
> Yes and changed.
>
>> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
>> > +               .is_constant (&l2)
>> > +           && known_le (l1, l2))
>> > +         {
>> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
>> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
>> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
>> > +                                        &subreg_offset));
>>
>> can_div_trunc_p discards the remainder, whereas it looks like here
>> you want an exact multiple.
>>
>> I don't think it's absolutely guaranteed that the “if” condition makes
>> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
>> a subreg of a vector of TIs in which the subreg offset is misaligned by
>> a DI offset.
>>
>> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
>> targets, lane numbering follows memory layout, just like subreg byte
>> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
>> was correct.
>>
>> In summary, I think the "if” condition should include something like:
>>
>>   constant_mulitple_p (SUBREG_BYTE (trueop0),
>>                        GET_MODE_UNIT_BITSIZE (mode),
>>                        &subreg_offset)
>>
>
> Changed.
>
>> Thanks,
>> Richard
>
>
> Update patch.
>
> -- 
> BR,
> Hongtao
>
> From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Tue, 13 Oct 2020 15:35:29 +0800
> Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
>  X.
>
> gcc/ChangeLog
> 	PR rtl-optimization/97249
> 	* simplify-rtx.c (simplify_binary_operation_1): Simplify
> 	vec_select of a subreg of X to a vec_select of X.
>
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/i386/pr97249-1.c: New test.
> ---
>  gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
>  2 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 869f0d11b2e..b1009837b2b 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  		    return subop1;
>  		}
>  	    }
> +
> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  int l2;
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)

Nothing really relies on this last line, and nothing uses l0, so better
to drop it.

> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> +		  .is_constant (&l2)
> +	      && known_le (l1, l2)

I'm not sure the last two &&s are really the important condition.
I think we should drop them for the suggestion below.

> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))
> +	    {
> +

Excess blank line.

> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));

This can just use ==.

> +	      bool success = true;
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx  = XVECEXP (trueop1, 0, i);

Excess space.

> +		  if (!CONST_INT_P (idx))

Here I think we should check:

		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

where:

       poly_uint64 nunits
         = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).

This makes sure that all indices are in range.  In particular, it's
valid for the SUBREG_REG to be narrower than mode, for appropriate
vec_select indices

Thanks,
Richard
Hongtao Liu Oct. 20, 2020, 3:20 a.m. UTC | #11
On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> >> > +          when X has same component mode as vec_select.  */
> >> > +       int l2;
> >> > +       if (GET_CODE (trueop0) == SUBREG
> >> > +           && GET_MODE_INNER (mode)
> >> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
> >>
> >> Better to use SUBREG_REG here and below.
> >>
> >
> > Yes and changed.
> >
> >> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> >> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> >> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
> >> > +               .is_constant (&l2)
> >> > +           && known_le (l1, l2))
> >> > +         {
> >> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
> >> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> >> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
> >> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> >> > +                                        &subreg_offset));
> >>
> >> can_div_trunc_p discards the remainder, whereas it looks like here
> >> you want an exact multiple.
> >>
> >> I don't think it's absolutely guaranteed that the “if” condition makes
> >> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
> >> a subreg of a vector of TIs in which the subreg offset is misaligned by
> >> a DI offset.
> >>
> >> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
> >> targets, lane numbering follows memory layout, just like subreg byte
> >> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
> >> was correct.
> >>
> >> In summary, I think the "if” condition should include something like:
> >>
> >>   constant_mulitple_p (SUBREG_BYTE (trueop0),
> >>                        GET_MODE_UNIT_BITSIZE (mode),
> >>                        &subreg_offset)
> >>
> >
> > Changed.
> >
> >> Thanks,
> >> Richard
> >
> >
> > Update patch.
> >
> > --
> > BR,
> > Hongtao
> >
> > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> > From: liuhongt <hongtao.liu@intel.com>
> > Date: Tue, 13 Oct 2020 15:35:29 +0800
> > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
> >  X.
> >
> > gcc/ChangeLog
> >       PR rtl-optimization/97249
> >       * simplify-rtx.c (simplify_binary_operation_1): Simplify
> >       vec_select of a subreg of X to a vec_select of X.
> >
> > gcc/testsuite/ChangeLog
> >
> >       * gcc.target/i386/pr97249-1.c: New test.
> > ---
> >  gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
> >  2 files changed, 74 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
> >
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index 869f0d11b2e..b1009837b2b 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
> >                   return subop1;
> >               }
> >           }
> > +
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       int l2;
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>
> Nothing really relies on this last line, and nothing uses l0, so better
> to drop it.
>

Changed, so there won't be any vector mode with variable number elts.

> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> > +               .is_constant (&l2)
> > +           && known_le (l1, l2)
>
> I'm not sure the last two &&s are really the important condition.
> I think we should drop them for the suggestion below.
>

Changed, assume gcc also support something like (vec_select:v4di
(reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
(const_int 0)]))
as long as the range of selection guaranteed by
  || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
> > +         {
> > +
>
> Excess blank line.
>

Changed.

> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>
> This can just use ==.
>

Changed.

> > +           bool success = true;
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx  = XVECEXP (trueop1, 0, i);
>
> Excess space.

Changed.

>
> > +               if (!CONST_INT_P (idx))
>
> Here I think we should check:
>
>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> where:
>
>        poly_uint64 nunits
>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
>

Changed.

> This makes sure that all indices are in range.  In particular, it's
> valid for the SUBREG_REG to be narrower than mode, for appropriate
> vec_select indices
>

Yes, that's what paradoxical subreg means.

> Thanks,
> Richard




--
BR,
Hongtao
Richard Sandiford Oct. 20, 2020, 4:42 p.m. UTC | #12
Hongtao Liu <crazylht@gmail.com> writes:
>> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
>> > +               .is_constant (&l2)
>> > +           && known_le (l1, l2)
>>
>> I'm not sure the last two &&s are really the important condition.
>> I think we should drop them for the suggestion below.
>>
>
> Changed, assume gcc also support something like (vec_select:v4di
> (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
> (const_int 0)]))
> as long as the range of selection guaranteed by
>   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

Yeah, that vec_select looks OK.

>>
>> > +               if (!CONST_INT_P (idx))
>>
>> Here I think we should check:
>>
>>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>>
>> where:
>>
>>        poly_uint64 nunits
>>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
>>
>
> Changed.
>
>> This makes sure that all indices are in range.  In particular, it's
>> valid for the SUBREG_REG to be narrower than mode, for appropriate
>> vec_select indices
>>
>
> Yes, that's what paradoxical subreg means.

But I was comparing the mode of the vec_select with the mode of the
SUBREG_REG (rather than the mode of trueop0 with the mode of the
SUBREG_REG, which is what matters for paradoxical subregs).

> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)

Unnecessary brackets around “GET_MODE_NUNITS (mode)”.

> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))

Sorry, my bad, this should be:

	      && constant_multiple_p (subreg_memory_offset (trueop0),
				      GET_MODE_UNIT_BITSIZE (mode),
				      &subreg_offset))

> +	    {
> +	      gcc_assert (XVECLEN (trueop1, 0) == l1);
> +	      bool success = true;
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}
> +	      if (success)
> +		{
> +		  rtx par = trueop1;
> +		  if (subreg_offset)
> +		    {
> +		      rtvec vec = rtvec_alloc (l1);
> +		      for (int i = 0; i < l1; i++)
> +			RTVEC_ELT (vec, i)
> +			  = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> +					     + subreg_offset));

This is applying subreg_offset to the pointer rather than the INTVAL.
It should be:

			  = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i))
				     + subreg_offset);

OK with those changes, thanks.

Richard
Segher Boessenkool Oct. 20, 2020, 8:43 p.m. UTC | #13
On Thu, Oct 15, 2020 at 04:14:39PM +0800, Hongtao Liu wrote:
> On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > > +           gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0),
> > > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
> > > +                                        &subreg_offset));
> >
> > Why is this needed?
> 
> I only found this interface for poly_uint64 division to get subreg_offset.

I mean, why do you have this assert at all?

> > > +               if (!CONST_INT_P (j)
> > > +                   || known_ge (UINTVAL (j), l2 - subreg_offset))
> > > +                 {
> > > +                   success = false;
> > > +                   break;
> > > +                 }
> > > +             }
> >
> > You don't have to test if the input RTL is valid.  You can assume it is.
> >
> 
> This test is for something like (vec_select:v2di (subreg:v4di
> (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])).
> const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist?

Assuming this is LE: yes, this is just invalid.  You can do whatever you
want with it (except ICE :-) )

> > subreg_offset will differ in meaning if big-endian; is this correct
> Yes.
> > there, do all the stars align so this code works out fine there as well?
> 
> i found it's a bit tricky to adjust selection index for target
> BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN.
> Especially for component mode smaller than word, Any interface to handle this?

For most things you want BYTES_BIG_ENDIAN, anything in a subreg here for
example.  I don't know which of those vectors use; I cannot find it in
the documentation, either.


Segher
Segher Boessenkool Oct. 20, 2020, 9:05 p.m. UTC | #14
On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote:
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))
> +	    {
> +	      gcc_assert (XVECLEN (trueop1, 0) == l1);

Why?  If we want to check that, it should be in RTL checking (and maybe
it already is!)

> +	      bool success = true;
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

Can that ever happen in valid code?  This seems to just hide problems.

> +		    {
> +		      success = false;
> +		      break;
> +		    }
> +		}
> +	      if (success)

If you have a huge piece of code like this, factor it?  Esp. if you now
need to have all kinds of booleans where you really just want to do
early returns.


Segher
Hongtao Liu Oct. 21, 2020, 2:43 a.m. UTC | #15
On Wed, Oct 21, 2020 at 12:42 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hongtao Liu <crazylht@gmail.com> writes:
> >> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> >> > +           && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> >> > +               .is_constant (&l2)
> >> > +           && known_le (l1, l2)
> >>
> >> I'm not sure the last two &&s are really the important condition.
> >> I think we should drop them for the suggestion below.
> >>
> >
> > Changed, assume gcc also support something like (vec_select:v4di
> > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1)
> > (const_int 0)]))
> > as long as the range of selection guaranteed by
> >   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> Yeah, that vec_select looks OK.
>
> >>
> >> > +               if (!CONST_INT_P (idx))
> >>
> >> Here I think we should check:
> >>
> >>                       || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> >>
> >> where:
> >>
> >>        poly_uint64 nunits
> >>          = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).
> >>
> >
> > Changed.
> >
> >> This makes sure that all indices are in range.  In particular, it's
> >> valid for the SUBREG_REG to be narrower than mode, for appropriate
> >> vec_select indices
> >>
> >
> > Yes, that's what paradoxical subreg means.
>
> But I was comparing the mode of the vec_select with the mode of the
> SUBREG_REG (rather than the mode of trueop0 with the mode of the
> SUBREG_REG, which is what matters for paradoxical subregs).
>
> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
> > +          when X has same component mode as vec_select.  */
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>
> Unnecessary brackets around “GET_MODE_NUNITS (mode)”.
>

Changed.

> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
>
> Sorry, my bad, this should be:
>
>               && constant_multiple_p (subreg_memory_offset (trueop0),
>                                       GET_MODE_UNIT_BITSIZE (mode),
>                                       &subreg_offset))
>

Changed.

> > +         {
> > +           gcc_assert (XVECLEN (trueop1, 0) == l1);
> > +           bool success = true;
> > +           poly_uint64 nunits
> > +             = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (idx)
> > +                   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
> > +           if (success)
> > +             {
> > +               rtx par = trueop1;
> > +               if (subreg_offset)
> > +                 {
> > +                   rtvec vec = rtvec_alloc (l1);
> > +                   for (int i = 0; i < l1; i++)
> > +                     RTVEC_ELT (vec, i)
> > +                       = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i)
> > +                                          + subreg_offset));
>
> This is applying subreg_offset to the pointer rather than the INTVAL.
> It should be:
>
>                           = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i))
>                                      + subreg_offset);
>

oops, sorry for typo and changed.

> OK with those changes, thanks.
>
> Richard
Hongtao Liu Oct. 21, 2020, 3:17 a.m. UTC | #16
On Wed, Oct 21, 2020 at 5:07 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote:
> > +       unsigned HOST_WIDE_INT subreg_offset = 0;
> > +       if (GET_CODE (trueop0) == SUBREG
> > +           && GET_MODE_INNER (mode)
> > +              == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +           && constant_multiple_p (SUBREG_BYTE (trueop0),
> > +                                   GET_MODE_UNIT_BITSIZE (mode),
> > +                                   &subreg_offset))
> > +         {
> > +           gcc_assert (XVECLEN (trueop1, 0) == l1);
>
> Why?  If we want to check that, it should be in RTL checking (and maybe
> it already is!)
>

Yes, RTL checking would guarantee that and it should be removed.

> > +           bool success = true;
> > +           poly_uint64 nunits
> > +             = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +           for (int i = 0; i != l1; i++)
> > +             {
> > +               rtx idx = XVECEXP (trueop1, 0, i);
> > +               if (!CONST_INT_P (idx)
> > +                   || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
>
> Can that ever happen in valid code?  This seems to just hide problems.
>

for rtx like (vec_select:v4di:(subreg:v8di (reg:v2di))
 (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])),
It seems valid for rtl checking.

> > +                 {
> > +                   success = false;
> > +                   break;
> > +                 }
> > +             }
> > +           if (success)
>
> If you have a huge piece of code like this, factor it?  Esp. if you now
> need to have all kinds of booleans where you really just want to do
> early returns.
>

I want to jump out of this if branch, since later codes in this function
 won't simplify VEC_SELECT further when it matches my if condition,
it's ok to use ealry returns.

>
> Segher

Update patch.
Richard Sandiford Oct. 21, 2020, 3:43 p.m. UTC | #17
Hongtao Liu <crazylht@gmail.com> writes:
> +	      poly_uint64 nunits
> +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> +	      rtx par = trueop1;
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx = XVECEXP (trueop1, 0, i);
> +		  if (!CONST_INT_P (idx)
> +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> +		    return 0;
> +		}

I think the previous version was better.  We shouldn't assume that
further simplification rules will fail just because the conditions
for this rule haven't been met.

Thanks,
Richard
Segher Boessenkool Oct. 21, 2020, 4:34 p.m. UTC | #18
On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote:
> Hongtao Liu <crazylht@gmail.com> writes:
> > +	      poly_uint64 nunits
> > +		= GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > +	      rtx par = trueop1;
> > +	      for (int i = 0; i != l1; i++)
> > +		{
> > +		  rtx idx = XVECEXP (trueop1, 0, i);
> > +		  if (!CONST_INT_P (idx)
> > +		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > +		    return 0;
> > +		}
> 
> I think the previous version was better.  We shouldn't assume that
> further simplification rules will fail just because the conditions
> for this rule haven't been met.

Yes.  My suggestion was to factor this big piece of code to a separate
function, and do an early return from *that*.

The patch is okay for trunk without that, with the clumsy booleans.
Thanks Hongtao!


Segher
Hongtao Liu Oct. 22, 2020, 3:33 a.m. UTC | #19
On Thu, Oct 22, 2020 at 12:36 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote:
> > Hongtao Liu <crazylht@gmail.com> writes:
> > > +         poly_uint64 nunits
> > > +           = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)));
> > > +         rtx par = trueop1;
> > > +         for (int i = 0; i != l1; i++)
> > > +           {
> > > +             rtx idx = XVECEXP (trueop1, 0, i);
> > > +             if (!CONST_INT_P (idx)
> > > +                 || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))
> > > +               return 0;
> > > +           }
> >
> > I think the previous version was better.  We shouldn't assume that
> > further simplification rules will fail just because the conditions
> > for this rule haven't been met.
>
> Yes.  My suggestion was to factor this big piece of code to a separate
> function, and do an early return from *that*.
>
> The patch is okay for trunk without that, with the clumsy booleans.
> Thanks Hongtao!
>
>
> Segher

Thank you both for the review, I'll commit the patch with *bool success* keeped.
diff mbox series

Patch

From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of paradoxical subreg.

For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
		   (parallel [(const_int 0) (const_int 1)]))
it could be simplified as inner.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c                        | 27 ++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@  simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		    return subop1;
 		}
 	    }
+
+	  /* For cases like
+	     (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+			      (parallel [(const_int 0) (const_int 1)])).
+	     return inner directly.  */
+	  if (GET_CODE (trueop0) == SUBREG
+	      && paradoxical_subreg_p (trueop0)
+	      && mode == GET_MODE (XEXP (trueop0, 0))
+	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	      && l0 % l1 == 0)
+	    {
+	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	      unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
+	      unsigned HOST_WIDE_INT sel = 0;
+	      int i = 0;
+	      for (;i != l1; i++)
+		{
+		  rtx j = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (j))
+		    break;
+		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
+		}
+	      /* ??? Need to simplify XEXP (trueop0, 0) here.  */
+	      if (sel == expect)
+		return XEXP (trueop0, 0);
+	    }
 	}
 
       if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 00000000000..bc34aa8baa6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@ 
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+    for (int i = 0 ; i != 8; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+    for (int i = 0 ; i != 4; i++)
+     p3[i] = p1[i] + p2[i];
+     return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+    for (int i = 0 ; i != 2; i++)
+      p3[i] = (long long)p1[i] + (long long)p2[i];
+     return;
+}
-- 
2.18.1