diff mbox series

[2/2] combine: Don't turn (mult (extend x) 2^n) into extract

Message ID 20201015085922.qmtum7kw2dlaixq3@arm.com
State New
Headers show
Series [1/2] aarch64: Remove support for extract-based addresses [PR96998] | expand

Commit Message

Alex Coplan Oct. 15, 2020, 8:59 a.m. UTC
Currently, make_extraction() identifies where we can emit an ASHIFT of
an extend in place of an extraction, but fails to make the corresponding
canonicalization/simplification when presented with a MULT by a power of
two. Such a representation is canonical when representing a left-shifted
address inside a MEM.

This patch remedies this situation: after the patch, make_extraction()
now also identifies RTXs such as:

(mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))

and rewrites this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

(This patch also fixes up a comment in expand_compound_operation() which
appears to have suffered from bitrot.)

This fixes several quality regressions on AArch64 after removing support for
addresses represented as sign_extract insns (1/2).

In particular, after the fix for PR96998, for the relevant testcase, we
have:

.L2:
        sxtw    x0, w0  // 8    [c=4 l=4]  *extendsidi2_aarch64/0
        add     x0, x19, x0, lsl 2      // 39   [c=8 l=4]  *add_lsl_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

and after this patch, we have:

.L2:
        add     x0, x19, w0, sxtw 2     // 39   [c=8 l=4]  *add_extendsi_shft_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

which restores the original optimal sequence we saw before the AArch64
patch.

Testing:
 * Bootstrapped and regtested on aarch64-linux-gnu, arm-linux-gnueabihf,
   and x86_64-linux-gnu.

OK for trunk?

Thanks,
Alex

---

gcc/ChangeLog:

	* combine.c (expand_compound_operation): Tweak variable name in
	comment to match source.
	(make_extraction): Handle mult by power of two in addition to
	ashift.

---
 gcc/combine.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

Comments

Alex Coplan Oct. 22, 2020, 8:36 a.m. UTC | #1
Ping.

Hopefully this is easier to review/test now that we fix the AArch64 bug first
and deliberately regress code quality so that the impact of the combine patch
can be measured.

On 15/10/2020 09:59, Alex Coplan via Gcc-patches wrote:
> Currently, make_extraction() identifies where we can emit an ASHIFT of
> an extend in place of an extraction, but fails to make the corresponding
> canonicalization/simplification when presented with a MULT by a power of
> two. Such a representation is canonical when representing a left-shifted
> address inside a MEM.
> 
> This patch remedies this situation: after the patch, make_extraction()
> now also identifies RTXs such as:
> 
> (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> 
> and rewrites this as:
> 
> (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> 
> instead of using a sign_extract.
> 
> (This patch also fixes up a comment in expand_compound_operation() which
> appears to have suffered from bitrot.)
> 
> This fixes several quality regressions on AArch64 after removing support for
> addresses represented as sign_extract insns (1/2).
> 
> In particular, after the fix for PR96998, for the relevant testcase, we
> have:
> 
> .L2:
>         sxtw    x0, w0  // 8    [c=4 l=4]  *extendsidi2_aarch64/0
>         add     x0, x19, x0, lsl 2      // 39   [c=8 l=4]  *add_lsl_di
>         bl      h               // 11   [c=4 l=4]  *call_value_insn/1
>         b       .L2             // 54   [c=4 l=4]  jump
> 
> and after this patch, we have:
> 
> .L2:
>         add     x0, x19, w0, sxtw 2     // 39   [c=8 l=4]  *add_extendsi_shft_di
>         bl      h               // 11   [c=4 l=4]  *call_value_insn/1
>         b       .L2             // 54   [c=4 l=4]  jump
> 
> which restores the original optimal sequence we saw before the AArch64
> patch.
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-linux-gnu, arm-linux-gnueabihf,
>    and x86_64-linux-gnu.
> 
> OK for trunk?
> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
> 	* combine.c (expand_compound_operation): Tweak variable name in
> 	comment to match source.
> 	(make_extraction): Handle mult by power of two in addition to
> 	ashift.
> 
> ---
>  gcc/combine.c | 37 ++++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
Segher Boessenkool Oct. 22, 2020, 8:29 p.m. UTC | #2
Hi Alex,

On Thu, Oct 22, 2020 at 09:36:02AM +0100, Alex Coplan wrote:
> Ping.
> 
> Hopefully this is easier to review/test now that we fix the AArch64 bug first
> and deliberately regress code quality so that the impact of the combine patch
> can be measured.

Yes, I am just busy.  I'll get to it soon.


Segher
Segher Boessenkool Oct. 22, 2020, 8:39 p.m. UTC | #3
On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> Currently, make_extraction() identifies where we can emit an ASHIFT of
> an extend in place of an extraction, but fails to make the corresponding
> canonicalization/simplification when presented with a MULT by a power of
> two. Such a representation is canonical when representing a left-shifted
> address inside a MEM.
> 
> This patch remedies this situation: after the patch, make_extraction()
> now also identifies RTXs such as:
> 
> (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> 
> and rewrites this as:
> 
> (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> 
> instead of using a sign_extract.

That is only correct if SUBREG_PROMOTED_VAR_P is true and
SUBREG_PROMOTED_UNSIGNED_P is false for r.  Is that guaranteed to be
true here (and how then?)


Segher
Segher Boessenkool Oct. 25, 2020, 8:49 p.m. UTC | #4
Hi!

On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> This patch remedies this situation: after the patch, make_extraction()
> now also identifies RTXs such as:
> 
> (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> 
> and rewrites this as:
> 
> (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> 
> instead of using a sign_extract.

Do we do that in simplify-rtx already (w/ the proper SUBREG_PROMOTED_*
stuff set)?  If not, we should; if so, why does that not work here?

(Just the subreg->{sign,zero}_extend that is, nothing with the mult.)


Segher
Alex Coplan Oct. 26, 2020, 10:09 a.m. UTC | #5
Hi Segher,

On 22/10/2020 15:39, Segher Boessenkool wrote:
> On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > Currently, make_extraction() identifies where we can emit an ASHIFT of
> > an extend in place of an extraction, but fails to make the corresponding
> > canonicalization/simplification when presented with a MULT by a power of
> > two. Such a representation is canonical when representing a left-shifted
> > address inside a MEM.
> > 
> > This patch remedies this situation: after the patch, make_extraction()
> > now also identifies RTXs such as:
> > 
> > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> > 
> > and rewrites this as:
> > 
> > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> > 
> > instead of using a sign_extract.
> 
> That is only correct if SUBREG_PROMOTED_VAR_P is true and
> SUBREG_PROMOTED_UNSIGNED_P is false for r.  Is that guaranteed to be
> true here (and how then?)

Sorry, I didn't give enough context here. For this subreg,
SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in
isolation is not valid.

The crucial piece of missing information is that we only make this
transformation in calls to make_extraction where len = 32 + n and
pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and
unsignedp is false (so we're doing a sign_extract).

Below is a proposed commit message, updated with this information.

OK for trunk?

Thanks,
Alex

---

Currently, make_extraction() identifies where we can emit an ashift of
an extend in place of an extraction, but fails to make the corresponding
canonicalization/simplification when presented with a mult by a power of
two. Such a representation is canonical when representing a left-shifted
address inside a mem.

This patch remedies this situation. For rtxes such as:

(mult:DI (subreg:DI (reg:SI r) 0) (const_int 2^n))

where the bottom 32 + n bits are valid (the higher-order bits are
undefined) and make_extraction() is being asked to sign_extract the
lower (valid) bits, after the patch, we rewrite this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

(This patch also fixes up a comment in expand_compound_operation() which
appears to have suffered from bitrot.)

For an example of the existing behavior in the ashift case, compiling
the following C testcase at -O2 on AArch64:

int h(void);
struct c d;
struct c {
    int e[1];
};

void g(void) {
  int k = 0;
  for (;; k = h()) {
    asm volatile ("" :: "r"(&d.e[k]));
  }
}

make_extraction gets called with len=34, pos_rtx=pos=0, unsignedp=false
(so we're sign_extracting the bottom 34 bits), and the following rtx in
inner:

(ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0)
    (const_int 2 [0x2]))

where it is clear that the bottom 34 bits are valid (and the
higher-order bits are undefined). We then hit the block:

  else if (GET_CODE (inner) == ASHIFT
	   && CONST_INT_P (XEXP (inner, 1))
	   && pos_rtx == 0 && pos == 0
	   && len > UINTVAL (XEXP (inner, 1)))
    {
      /* We're extracting the least significant bits of an rtx
	 (ashift X (const_int C)), where LEN > C.  Extract the
	 least significant (LEN - C) bits of X, giving an rtx
	 whose mode is MODE, then shift it left C times.  */
      new_rtx = make_extraction (mode, XEXP (inner, 0),
			     0, 0, len - INTVAL (XEXP (inner, 1)),
			     unsignedp, in_dest, in_compare);
      if (new_rtx != 0)
	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
    }

and the recursive call to make_extraction() is asked to sign_extract the
bottom (LEN - C) = 32 bits of:

(subreg:DI (reg/v:SI 93) 0)

which gives us:

(sign_extend:DI (reg/v:SI 93 [ k ])). The gen_rtx_ASHIFT call then gives
us the final result:

(ashift:DI (sign_extend:DI (reg/v:SI 93 [ k ]))
    (const_int 2 [0x2])).

Now all that this patch does is to teach the block that looks for these
shifts how to handle the same thing written as a mult instead of an
ashift. In particular, for the testcase in the PR (96998), we hit
make_extraction with len=34, unsignedp=false, pos_rtx=pos=0, and inner
as:

(mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
    (const_int 4 [0x4]))

It should be clear from the above that this can be handled in an
analogous way: the recursive case is precisely the same, the only
difference is that we take the log2 of the shift amount and write the
end result as a mult instead.

This fixes several quality regressions on AArch64 after removing support
for addresses represented as sign_extract insns (1/2).

In particular, after the fix for PR96998, for the relevant testcase, we
have:

.L2:
        sxtw    x0, w0  // 8    [c=4 l=4]  *extendsidi2_aarch64/0
        add     x0, x19, x0, lsl 2      // 39   [c=8 l=4]  *add_lsl_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

and after this patch, we have:

.L2:
        add     x0, x19, w0, sxtw 2     // 39   [c=8 l=4]  *add_extendsi_shft_di
        bl      h               // 11   [c=4 l=4]  *call_value_insn/1
        b       .L2             // 54   [c=4 l=4]  jump

which restores the original optimal sequence we saw before the AArch64
patch.

---

gcc/ChangeLog:

	* combine.c (expand_compound_operation): Tweak variable name in
	comment to match source.
	(make_extraction): Handle mult by power of two in addition to
	ashift.
diff --git a/gcc/combine.c b/gcc/combine.c
index c88382efbd3..fe8eff2b464 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
     }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
-     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
-     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.
 
      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 	is_mode = GET_MODE (SUBREG_REG (inner));
       inner = SUBREG_REG (inner);
     }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
 	   && CONST_INT_P (XEXP (inner, 1))
-	   && pos_rtx == 0 && pos == 0
-	   && len > UINTVAL (XEXP (inner, 1)))
-    {
-      /* We're extracting the least significant bits of an rtx
-	 (ashift X (const_int C)), where LEN > C.  Extract the
-	 least significant (LEN - C) bits of X, giving an rtx
-	 whose mode is MODE, then shift it left C times.  */
-      new_rtx = make_extraction (mode, XEXP (inner, 0),
-			     0, 0, len - INTVAL (XEXP (inner, 1)),
-			     unsignedp, in_dest, in_compare);
-      if (new_rtx != 0)
-	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+	   && pos_rtx == 0 && pos == 0)
+    {
+      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+      const auto code = GET_CODE (inner);
+      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
+
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
+	{
+	  /* We're extracting the least significant bits of an rtx
+	     (ashift X (const_int C)) or (mult X (const_int (2^C))),
+	     where LEN > C.  Extract the least significant (LEN - C) bits
+	     of X, giving an rtx whose mode is MODE, then shift it left
+	     C times.  */
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				 0, 0, len - shift_amt,
+				 unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1));
+	}
     }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
Segher Boessenkool Oct. 26, 2020, 10:48 a.m. UTC | #6
Hi!

On Mon, Oct 26, 2020 at 10:09:41AM +0000, Alex Coplan wrote:
> On 22/10/2020 15:39, Segher Boessenkool wrote:
> > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > > Currently, make_extraction() identifies where we can emit an ASHIFT of
> > > an extend in place of an extraction, but fails to make the corresponding
> > > canonicalization/simplification when presented with a MULT by a power of
> > > two. Such a representation is canonical when representing a left-shifted
> > > address inside a MEM.
> > > 
> > > This patch remedies this situation: after the patch, make_extraction()
> > > now also identifies RTXs such as:
> > > 
> > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> > > 
> > > and rewrites this as:
> > > 
> > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> > > 
> > > instead of using a sign_extract.
> > 
> > That is only correct if SUBREG_PROMOTED_VAR_P is true and
> > SUBREG_PROMOTED_UNSIGNED_P is false for r.  Is that guaranteed to be
> > true here (and how then?)
> 
> Sorry, I didn't give enough context here. For this subreg,
> SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in
> isolation is not valid.
> 
> The crucial piece of missing information is that we only make this
> transformation in calls to make_extraction where len = 32 + n and
> pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and
> unsignedp is false (so we're doing a sign_extract).

The high half of a DI subreg of a SI reg is *undefined* if
SUBREG_PROMOTED_VAR_P is not set.  So the code you get as input:

> (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0)
>     (const_int 2 [0x2]))

... is already incorrect.  Please fix that?

> where it is clear that the bottom 34 bits are valid (and the
> higher-order bits are undefined). We then hit the block:

No, only the bottom 32 bits are valid.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index c88382efbd3..fe8eff2b464 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
>      }
>  
>    /* If we reach here, we want to return a pair of shifts.  The inner
> -     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
> -     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
> +     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
> +     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
>       logical depending on the value of UNSIGNEDP.
>  
>       If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be

MODEWIDTH isn't defined here yet, it is initialised just below to
MODE_PRECISION (mode).


Segher
Alex Coplan Oct. 26, 2020, 11:06 a.m. UTC | #7
On 26/10/2020 05:48, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Oct 26, 2020 at 10:09:41AM +0000, Alex Coplan wrote:
> > On 22/10/2020 15:39, Segher Boessenkool wrote:
> > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > > > Currently, make_extraction() identifies where we can emit an ASHIFT of
> > > > an extend in place of an extraction, but fails to make the corresponding
> > > > canonicalization/simplification when presented with a MULT by a power of
> > > > two. Such a representation is canonical when representing a left-shifted
> > > > address inside a MEM.
> > > > 
> > > > This patch remedies this situation: after the patch, make_extraction()
> > > > now also identifies RTXs such as:
> > > > 
> > > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> > > > 
> > > > and rewrites this as:
> > > > 
> > > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> > > > 
> > > > instead of using a sign_extract.
> > > 
> > > That is only correct if SUBREG_PROMOTED_VAR_P is true and
> > > SUBREG_PROMOTED_UNSIGNED_P is false for r.  Is that guaranteed to be
> > > true here (and how then?)
> > 
> > Sorry, I didn't give enough context here. For this subreg,
> > SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in
> > isolation is not valid.
> > 
> > The crucial piece of missing information is that we only make this
> > transformation in calls to make_extraction where len = 32 + n and
> > pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and
> > unsignedp is false (so we're doing a sign_extract).
> 
> The high half of a DI subreg of a SI reg is *undefined* if
> SUBREG_PROMOTED_VAR_P is not set.  So the code you get as input:
> 
> > (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0)
> >     (const_int 2 [0x2]))
> 
> ... is already incorrect.  Please fix that?
> 
> > where it is clear that the bottom 34 bits are valid (and the
> > higher-order bits are undefined). We then hit the block:
> 
> No, only the bottom 32 bits are valid.

Well, only the low 32 bits of the subreg are valid. But because those
low 32 bits are shifted left 2 times, the low 34 bits of the ashift are
valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above
those are from the inner SImode reg (with the upper 62 bits being
undefined).

> 
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index c88382efbd3..fe8eff2b464 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
> >      }
> >  
> >    /* If we reach here, we want to return a pair of shifts.  The inner
> > -     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
> > -     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
> > +     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
> > +     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
> >       logical depending on the value of UNSIGNEDP.
> >  
> >       If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
> 
> MODEWIDTH isn't defined here yet, it is initialised just below to
> MODE_PRECISION (mode).

Yes, but bitsize isn't defined at all in this function AFAICT. Are
comments not permitted to refer to variables defined immediately beneath
them?

Alex
Alex Coplan Oct. 26, 2020, 11:10 a.m. UTC | #8
On 26/10/2020 11:06, Alex Coplan via Gcc-patches wrote:

> Well, only the low 32 bits of the subreg are valid. But because those
> low 32 bits are shifted left 2 times, the low 34 bits of the ashift are
> valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above
> those are from the inner SImode reg (with the upper 62 bits being
> undefined).

s/upper 62 bits/upper 30 bits/
Segher Boessenkool Oct. 26, 2020, 11:51 a.m. UTC | #9
On Mon, Oct 26, 2020 at 11:06:22AM +0000, Alex Coplan wrote:
> Well, only the low 32 bits of the subreg are valid. But because those
> low 32 bits are shifted left 2 times, the low 34 bits of the ashift are
> valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above
> those are from the inner SImode reg (with the upper 62 bits being
> undefined).

Ugh.  Yes, I think you are right.  One more reason why we should only
use *explicit* sign/zero extends, none of this confusing subreg
business :-(

> > > diff --git a/gcc/combine.c b/gcc/combine.c
> > > index c88382efbd3..fe8eff2b464 100644
> > > --- a/gcc/combine.c
> > > +++ b/gcc/combine.c
> > > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
> > >      }
> > >  
> > >    /* If we reach here, we want to return a pair of shifts.  The inner
> > > -     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
> > > -     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
> > > +     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
> > > +     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
> > >       logical depending on the value of UNSIGNEDP.
> > >  
> > >       If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
> > 
> > MODEWIDTH isn't defined here yet, it is initialised just below to
> > MODE_PRECISION (mode).
> 
> Yes, but bitsize isn't defined at all in this function AFAICT. Are
> comments not permitted to refer to variables defined immediately beneath
> them?

Of course you can -- comments are free form text after all -- but as
written it suggest there already is an initialised variable "modewidth".

Just move the initialisation to above this comment?


Segher
Segher Boessenkool Oct. 26, 2020, 12:12 p.m. UTC | #10
Hi!

On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
>  	is_mode = GET_MODE (SUBREG_REG (inner));
>        inner = SUBREG_REG (inner);
>      }
> +  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
> +	   && pos_rtx == 0 && pos == 0)
> +    {
> +      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> +      const auto code = GET_CODE (inner);
> +      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;

Can you instead replace the mult by a shift somewhere earlier in
make_extract?  That would make a lot more sense :-)


Segher
Alex Coplan Oct. 26, 2020, 1:18 p.m. UTC | #11
On 26/10/2020 06:51, Segher Boessenkool wrote:
> On Mon, Oct 26, 2020 at 11:06:22AM +0000, Alex Coplan wrote:
> > Well, only the low 32 bits of the subreg are valid. But because those
> > low 32 bits are shifted left 2 times, the low 34 bits of the ashift are
> > valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above
> > those are from the inner SImode reg (with the upper 62 bits being
> > undefined).
> 
> Ugh.  Yes, I think you are right.  One more reason why we should only
> use *explicit* sign/zero extends, none of this confusing subreg
> business :-(

Yeah. IIRC expand_compound_operation() introduces the subreg because it
explicitly wants to rewrite the sign_extend using a pair of shifts (without
using an extend rtx). Something like:

(ashiftrt:DI
  (ashift:DI
    (subreg:DI (reg:SI r) 0)
    (const_int 32))
  (const_int 32))

> 
> > > > diff --git a/gcc/combine.c b/gcc/combine.c
> > > > index c88382efbd3..fe8eff2b464 100644
> > > > --- a/gcc/combine.c
> > > > +++ b/gcc/combine.c
> > > > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
> > > >      }
> > > >  
> > > >    /* If we reach here, we want to return a pair of shifts.  The inner
> > > > -     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
> > > > -     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
> > > > +     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
> > > > +     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
> > > >       logical depending on the value of UNSIGNEDP.
> > > >  
> > > >       If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
> > > 
> > > MODEWIDTH isn't defined here yet, it is initialised just below to
> > > MODE_PRECISION (mode).
> > 
> > Yes, but bitsize isn't defined at all in this function AFAICT. Are
> > comments not permitted to refer to variables defined immediately beneath
> > them?
> 
> Of course you can -- comments are free form text after all -- but as
> written it suggest there already is an initialised variable "modewidth".
> 
> Just move the initialisation to above this comment?

Sure, see the revised patch attached.

Thanks,
Alex
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..d4793c1c575 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7418,9 +7418,11 @@ expand_compound_operation (rtx x)
 
     }
 
+  modewidth = GET_MODE_PRECISION (mode);
+
   /* If we reach here, we want to return a pair of shifts.  The inner
-     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
-     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.
 
      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7433,7 +7435,6 @@ expand_compound_operation (rtx x)
      extraction.  Then the constant of 31 would be substituted in
      to produce such a position.  */
 
-  modewidth = GET_MODE_PRECISION (mode);
   if (modewidth >= pos + len)
     {
       tem = gen_lowpart (mode, XEXP (x, 0));
@@ -7650,20 +7651,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 	is_mode = GET_MODE (SUBREG_REG (inner));
       inner = SUBREG_REG (inner);
     }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
 	   && CONST_INT_P (XEXP (inner, 1))
-	   && pos_rtx == 0 && pos == 0
-	   && len > UINTVAL (XEXP (inner, 1)))
-    {
-      /* We're extracting the least significant bits of an rtx
-	 (ashift X (const_int C)), where LEN > C.  Extract the
-	 least significant (LEN - C) bits of X, giving an rtx
-	 whose mode is MODE, then shift it left C times.  */
-      new_rtx = make_extraction (mode, XEXP (inner, 0),
-			     0, 0, len - INTVAL (XEXP (inner, 1)),
-			     unsignedp, in_dest, in_compare);
-      if (new_rtx != 0)
-	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+	   && pos_rtx == 0 && pos == 0)
+    {
+      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+      const auto code = GET_CODE (inner);
+      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
+
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
+	{
+	  /* We're extracting the least significant bits of an rtx
+	     (ashift X (const_int C)) or (mult X (const_int (2^C))),
+	     where LEN > C.  Extract the least significant (LEN - C) bits
+	     of X, giving an rtx whose mode is MODE, then shift it left
+	     C times.  */
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				 0, 0, len - shift_amt,
+				 unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1));
+	}
     }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
Alex Coplan Oct. 26, 2020, 1:28 p.m. UTC | #12
On 26/10/2020 07:12, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
> >  	is_mode = GET_MODE (SUBREG_REG (inner));
> >        inner = SUBREG_REG (inner);
> >      }
> > +  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
> > +	   && pos_rtx == 0 && pos == 0)
> > +    {
> > +      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> > +      const auto code = GET_CODE (inner);
> > +      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
> 
> Can you instead replace the mult by a shift somewhere earlier in
> make_extract?  That would make a lot more sense :-)

I guess we could do this, the only complication being that we can't
unconditionally rewrite the expression using a shift, since mult is canonical
inside a mem (which is why we see it in the testcase in the PR).

So if we did this, we'd have to remember that we did it earlier on, and rewrite
it back to a mult accordingly.

Would you still like to see a version of the patch that does that, or is this
version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ?

Thanks,
Alex
Segher Boessenkool Oct. 26, 2020, 5:43 p.m. UTC | #13
On Mon, Oct 26, 2020 at 01:28:42PM +0000, Alex Coplan wrote:
> On 26/10/2020 07:12, Segher Boessenkool wrote:
> > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > Can you instead replace the mult by a shift somewhere earlier in
> > make_extract?  That would make a lot more sense :-)
> 
> I guess we could do this, the only complication being that we can't
> unconditionally rewrite the expression using a shift, since mult is canonical
> inside a mem (which is why we see it in the testcase in the PR).

You can do it just inside the block you are already editing.

> So if we did this, we'd have to remember that we did it earlier on, and rewrite
> it back to a mult accordingly.

Yes, this function has ridiculously complicated cpontrol flow.  So I
cannot trick you into improving it? ;-)

> Would you still like to see a version of the patch that does that, or is this
> version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ?

I do not like handling both mult and ashift in one case like this, it
complicates things for no good reason.  Write it as two cases, and it
should be good.


Segher
Segher Boessenkool Oct. 26, 2020, 5:48 p.m. UTC | #14
On Mon, Oct 26, 2020 at 01:18:54PM +0000, Alex Coplan wrote:
> -  else if (GET_CODE (inner) == ASHIFT
> +  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)

As I wrote in the other mail, write this as two cases.  Write something
in the comment for the mult one that this is for the canonicalisation of
memory addresses (feel free to use swear words).

> +    {
> +      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> +      const auto code = GET_CODE (inner);
> +      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
> +
> +      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)

Space after cast; better is to not need a cast at all (and you do not
need one, len is unsigned HOST_WIDE_INT already).


Segher
Alex Coplan Oct. 27, 2020, 10:35 a.m. UTC | #15
On 26/10/2020 12:43, Segher Boessenkool wrote:
> On Mon, Oct 26, 2020 at 01:28:42PM +0000, Alex Coplan wrote:
> > On 26/10/2020 07:12, Segher Boessenkool wrote:
> > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote:
> > > Can you instead replace the mult by a shift somewhere earlier in
> > > make_extract?  That would make a lot more sense :-)
> > 
> > I guess we could do this, the only complication being that we can't
> > unconditionally rewrite the expression using a shift, since mult is canonical
> > inside a mem (which is why we see it in the testcase in the PR).
> 
> You can do it just inside the block you are already editing.
> 
> > So if we did this, we'd have to remember that we did it earlier on, and rewrite
> > it back to a mult accordingly.
> 
> Yes, this function has ridiculously complicated cpontrol flow.  So I
> cannot trick you into improving it? ;-)
> 
> > Would you still like to see a version of the patch that does that, or is this
> > version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ?
> 
> I do not like handling both mult and ashift in one case like this, it
> complicates things for no good reason.  Write it as two cases, and it
> should be good.

OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
of make_extraction so that the existing ASHIFT block can do the work for
us. We remember if we did it and then convert it back if necessary.

I'm not convinced that it's an improvement. What do you think?

Bootstrap/regtest in progress on aarch64-none-linux-gnu. I'll test other
platforms (as well as testing on top of 1/2) and repost with a proper
commit message if you think it looks good.

Alex

---

gcc/ChangeLog:

	(make_extraction): Temporarily rewrite (mult x 2^n) so that we
	can handle it as (ashift x n) and avoid emitting an extract where
	extend+shift will suffice.
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..991dc5eabf7 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7628,10 +7628,25 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
   rtx new_rtx = 0;
   rtx orig_pos_rtx = pos_rtx;
   HOST_WIDE_INT orig_pos;
+  bool rewrote_mult_p = false;
 
   if (pos_rtx && CONST_INT_P (pos_rtx))
     pos = INTVAL (pos_rtx), pos_rtx = 0;
 
+  if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
+    {
+      /* We have to handle shifts disguised as multiplications
+	 by powers of two since this is the canonical form for
+	 mem addresses.  */
+      const int shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+      if (shift_amt > 0)
+	{
+	  PUT_CODE (inner, ASHIFT);
+	  INTVAL (XEXP (inner, 1)) = shift_amt;
+	  rewrote_mult_p = true;
+	}
+    }
+
   if (GET_CODE (inner) == SUBREG
       && subreg_lowpart_p (inner)
       && (paradoxical_subreg_p (inner)
@@ -7663,7 +7678,7 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 			     0, 0, len - INTVAL (XEXP (inner, 1)),
 			     unsignedp, in_dest, in_compare);
       if (new_rtx != 0)
-	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+	new_rtx = gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
     }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
@@ -7673,6 +7688,17 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 	   && known_le (pos + len, GET_MODE_PRECISION (is_mode)))
     inner = XEXP (inner, 0);
 
+  if (rewrote_mult_p)
+    {
+      /* If we rewrote MULT -> ASHIFT, convert it back now.  */
+      rtx x = new_rtx ? new_rtx : inner;
+      PUT_CODE (x, MULT);
+      INTVAL (XEXP (x, 1)) = 1 << INTVAL (XEXP (x, 1));
+    }
+
+  if (new_rtx)
+    return new_rtx;
+
   inner_mode = GET_MODE (inner);
 
   /* See if this can be done without an extraction.  We never can if the
Alex Coplan Oct. 27, 2020, 11:49 a.m. UTC | #16
On 27/10/2020 10:35, Alex Coplan via Gcc-patches wrote:
> On 26/10/2020 12:43, Segher Boessenkool wrote:
> > I do not like handling both mult and ashift in one case like this, it
> > complicates things for no good reason.  Write it as two cases, and it
> > should be good.
> 
> OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> of make_extraction so that the existing ASHIFT block can do the work for
> us. We remember if we did it and then convert it back if necessary.
> 
> I'm not convinced that it's an improvement. What do you think?
> 
> Bootstrap/regtest in progress on aarch64-none-linux-gnu. I'll test other
> platforms (as well as testing on top of 1/2) and repost with a proper
> commit message if you think it looks good.

This passes bootstrap and regtest on aarch64, FWIW.

Alex

> 
> gcc/ChangeLog:
> 
> 	(make_extraction): Temporarily rewrite (mult x 2^n) so that we
> 	can handle it as (ashift x n) and avoid emitting an extract where
> 	extend+shift will suffice.
Segher Boessenkool Oct. 27, 2020, 10:31 p.m. UTC | #17
On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote:
> On 26/10/2020 12:43, Segher Boessenkool wrote:
> > I do not like handling both mult and ashift in one case like this, it
> > complicates things for no good reason.  Write it as two cases, and it
> > should be good.
> 
> OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> of make_extraction so that the existing ASHIFT block can do the work for
> us. We remember if we did it and then convert it back if necessary.
> 
> I'm not convinced that it's an improvement. What do you think?

Restoring it like that is just yuck.  That can be okay if it is as the
start and end of a smallish function, importantly some self-contained
piece of code, but this is not.

Just write it as two blocks? One handling the shift, that is already
there; and add one block adding the mult case.  That should not
increase the complexity of this already way too complex code.


Segher
Alex Coplan Oct. 28, 2020, 9:09 a.m. UTC | #18
On 27/10/2020 17:31, Segher Boessenkool wrote:
> On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote:
> > On 26/10/2020 12:43, Segher Boessenkool wrote:
> > > I do not like handling both mult and ashift in one case like this, it
> > > complicates things for no good reason.  Write it as two cases, and it
> > > should be good.
> > 
> > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> > of make_extraction so that the existing ASHIFT block can do the work for
> > us. We remember if we did it and then convert it back if necessary.
> > 
> > I'm not convinced that it's an improvement. What do you think?
> 
> Restoring it like that is just yuck.  That can be okay if it is as the
> start and end of a smallish function, importantly some self-contained
> piece of code, but this is not.
> 
> Just write it as two blocks? One handling the shift, that is already
> there; and add one block adding the mult case.  That should not
> increase the complexity of this already way too complex code.

OK, how about the attached?

Bootstrap and regtest in progress on aarch64-none-linux-gnu.

Thanks,
Alex

---

gcc/ChangeLog:

	* combine.c (make_extraction): Also handle shfits written as
	(mult x 2^n), avoid creating an extract rtx for these.
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..5040dabff98 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
       if (new_rtx != 0)
 	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
     }
+  else if (GET_CODE (inner) == MULT
+	   && CONST_INT_P (XEXP (inner, 1))
+	   && pos_rtx == 0 && pos == 0)
+    {
+      /* We're extracting the least significant bits of an rtx
+	 (mult X (const_int 2^C)), where LEN > C.  Extract the
+	 least significant (LEN - C) bits of X, giving an rtx
+	 whose mode is MODE, then multiply it by 2^C.  */
+      const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+      if (shift_amt > 0 && len > shift_amt)
+	{
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				     0, 0, len - shift_amt,
+				     unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
+	}
+    }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
 	      bits outside of is_mode, don't look through
Alex Coplan Oct. 28, 2020, 11:13 a.m. UTC | #19
On 28/10/2020 09:09, Alex Coplan via Gcc-patches wrote:
> On 27/10/2020 17:31, Segher Boessenkool wrote:
> > On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote:
> > > On 26/10/2020 12:43, Segher Boessenkool wrote:
> > > > I do not like handling both mult and ashift in one case like this, it
> > > > complicates things for no good reason.  Write it as two cases, and it
> > > > should be good.
> > > 
> > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top
> > > of make_extraction so that the existing ASHIFT block can do the work for
> > > us. We remember if we did it and then convert it back if necessary.
> > > 
> > > I'm not convinced that it's an improvement. What do you think?
> > 
> > Restoring it like that is just yuck.  That can be okay if it is as the
> > start and end of a smallish function, importantly some self-contained
> > piece of code, but this is not.
> > 
> > Just write it as two blocks? One handling the shift, that is already
> > there; and add one block adding the mult case.  That should not
> > increase the complexity of this already way too complex code.
> 
> OK, how about the attached?
> 
> Bootstrap and regtest in progress on aarch64-none-linux-gnu.

This fails bootstrap since we trigger -Wsign-compare without the cast to
unsigned HOST_WIDE_INT on shift_amt:

> +      const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
> +      if (shift_amt > 0 && len > shift_amt)

So, quoting an earlier reply:

> > +      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
> > +
> > +      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
> 
> Space after cast; better is to not need a cast at all (and you do not
> need one, len is unsigned HOST_WIDE_INT already).

unfortunately we do need the cast here.

See the revised patch attached, bootstrap on aarch64 in progress.

Thanks,
Alex

---

gcc/ChangeLog:

	* combine.c (make_extraction): Also handle shfits written as
	(mult x 2^n), avoid creating an extract rtx for these.
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..729d04b1d9e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
       if (new_rtx != 0)
 	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
     }
+  else if (GET_CODE (inner) == MULT
+	   && CONST_INT_P (XEXP (inner, 1))
+	   && pos_rtx == 0 && pos == 0)
+    {
+      /* We're extracting the least significant bits of an rtx
+	 (mult X (const_int 2^C)), where LEN > C.  Extract the
+	 least significant (LEN - C) bits of X, giving an rtx
+	 whose mode is MODE, then multiply it by 2^C.  */
+      const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT) shift_amt)
+	{
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				     0, 0, len - shift_amt,
+				     unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
+	}
+    }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
 	      bits outside of is_mode, don't look through
Segher Boessenkool Oct. 28, 2020, 8:27 p.m. UTC | #20
Hi!

On Wed, Oct 28, 2020 at 11:13:28AM +0000, Alex Coplan wrote:
> This fails bootstrap since we trigger -Wsign-compare without the cast to
> unsigned HOST_WIDE_INT on shift_amt:
> 
> +  else if (GET_CODE (inner) == MULT
> +	   && CONST_INT_P (XEXP (inner, 1))
> +	   && pos_rtx == 0 && pos == 0)
> +    {
> +      /* We're extracting the least significant bits of an rtx
> +	 (mult X (const_int 2^C)), where LEN > C.  Extract the
> +	 least significant (LEN - C) bits of X, giving an rtx
> +	 whose mode is MODE, then multiply it by 2^C.  */
> +      const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
> +      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT) shift_amt)

if (IN_RANGE (shift_amt, 1, len - 1))

> +	{
> +	  new_rtx = make_extraction (mode, XEXP (inner, 0),
> +				     0, 0, len - shift_amt,
> +				     unsignedp, in_dest, in_compare);
> +	  if (new_rtx)
> +	    return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
> +	}
> +    }
>    else if (GET_CODE (inner) == TRUNCATE
>  	   /* If trying or potentionally trying to extract
>  	      bits outside of is_mode, don't look through

Okay for trunk like that.  Thanks!


Segher
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..88f3925aee7 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@  expand_compound_operation (rtx x)
     }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
-     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
-     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.
 
      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7650,20 +7650,27 @@  make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 	is_mode = GET_MODE (SUBREG_REG (inner));
       inner = SUBREG_REG (inner);
     }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
 	   && CONST_INT_P (XEXP (inner, 1))
-	   && pos_rtx == 0 && pos == 0
-	   && len > UINTVAL (XEXP (inner, 1)))
-    {
-      /* We're extracting the least significant bits of an rtx
-	 (ashift X (const_int C)), where LEN > C.  Extract the
-	 least significant (LEN - C) bits of X, giving an rtx
-	 whose mode is MODE, then shift it left C times.  */
-      new_rtx = make_extraction (mode, XEXP (inner, 0),
-			     0, 0, len - INTVAL (XEXP (inner, 1)),
-			     unsignedp, in_dest, in_compare);
-      if (new_rtx != 0)
-	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+	   && pos_rtx == 0 && pos == 0)
+    {
+      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+      const auto code = GET_CODE (inner);
+      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
+
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
+	{
+	  /* We're extracting the least significant bits of an rtx
+	     (ashift X (const_int C)) or (mult X (const_int (2^C))),
+	     where LEN > C.  Extract the least significant (LEN - C) bits
+	     of X, giving an rtx whose mode is MODE, then shift it left
+	     C times.  */
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				 0, 0, len - shift_amt,
+				 unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1));
+	}
     }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract