diff mbox series

[v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998]

Message ID 20200930103931.raxfvhpuehrgulcp@arm.com
State New
Headers show
Series [v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998] | expand

Commit Message

Alex Coplan Sept. 30, 2020, 10:39 a.m. UTC
Currently, make_extraction() identifies where we can emit an ASHIFT of
an extend in place of an extraction, but fails to make the corresponding
canonicalization/simplification when presented with a MULT by a power of
two. Such a representation is canonical when representing a left-shifted
address inside a MEM.

This patch remedies this situation: after the patch, make_extraction()
now also identifies RTXs such as:

(mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))

and rewrites this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

(This patch also fixes up a comment in expand_compound_operation() which
appears to have suffered from bitrot.)

This fixes PR96998: an ICE on AArch64 due to an unrecognised
sign_extract insn which was exposed by
r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf. That change
introduced a canonicalisation in LRA to rewrite mult to shift in address
reloads.

Prior to this patch, the flow was as follows. We start with the
following insn going into combine:

(insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (reg:DI 98 [ g ])
                    (const_int 4 [0x4]))
                (reg/f:DI 96)) [3 *i_5+0 S4 A32])
        (asm_operands:SI ("") ("=Q") 0 []
             []
             [] test.c:11)) "test.c":11:5 -1
     (expr_list:REG_DEAD (reg:DI 98 [ g ])
        (nil)))

Then combine turns this into a sign_extract:

(insn 9 8 10 3 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
                        (const_int 4 [0x4]))
                    (const_int 34 [0x22])
                    (const_int 0 [0]))
                (reg/f:DI 96)) [3 *i_5+0 S4 A32])
        (asm_operands:SI ("") ("=Q") 0 []
             []
             [] test.c:11)) "test.c":11:5 -1
     (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
        (nil)))

Then LRA reloads the address and (prior to the LRA change) we get:

(insn 32 8 9 3 (set (reg:DI 0 x0 [100])
        (plus:DI (sign_extract:DI (mult:DI (reg:DI 0 x0 [orig:92 g ] [92])
                    (const_int 4 [0x4]))
                (const_int 34 [0x22])
                (const_int 0 [0]))
            (reg/f:DI 19 x19 [96]))) "test.c":11:5 283 {*add_extvdi_multp2}
     (nil))
(insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
        (asm_operands:SI ("") ("=Q") 0 []
             []
             [] test.c:11)) "test.c":11:5 -1
     (nil))

Now observe that insn 32 here is not canonical: firstly, we should be
using an ASHIFT by 2 instead of a MULT by 4, since we're outside of a
MEM. Indeed, the LRA change remedies this, and support for such insns in
the AArch64 backend was dropped in
r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8.

Now the reason we ICE after the LRA change here is that AArch64 has
never supported the ASHIFT variant of this sign_extract insn. Inspecting
the unrecognised reloaded insn confirms this:

(gdb) p debug(insn)
(insn 33 8 34 3 (set (reg:DI 100)
        (sign_extract:DI (ashift:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
                (const_int 2 [0x2]))
            (const_int 34 [0x22])
            (const_int 0 [0]))) "test.c":11:5 -1
     (nil))

The thesis of this patch is that combine should _never_ be producing
such an insn. Clearly this should be canonicalised as an extend
operation instead (as combine already does in make_extraction() for the
ASHIFT form). After this change to combine, we get:

(insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92 [ g ]))
                    (const_int 4 [0x4]))
                (reg/f:DI 96)) [3 *i_5+0 S4 A32])
        (asm_operands:SI ("") ("=Q") 0 []
             []
             [] test.c:11)) "test.c":11:5 -1
     (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
        (nil)))

coming out of combine, and LRA can happily reload the address:

(insn 32 8 9 3 (set (reg:DI 0 x0 [100])
        (plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 0 x0 [orig:92 g ] [92]))
                (const_int 2 [0x2]))
            (reg/f:DI 19 x19 [96]))) "test.c":11:5 245 {*add_extendsi_shft_di}
     (nil))
(insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
        (asm_operands:SI ("") ("=Q") 0 []
             []
             [] test.c:11)) "test.c":11:5 -1
     (nil))

and all is well, with nice simple and canonical RTL being used
throughout.

Testing:
 * Bootstrap and regtest on aarch64-linux-gnu, arm-linux-gnueabihf, and
   x86-linux-gnu in progress.

OK for trunk (with AArch64 changes discussed here [0] as a follow-on
patch) provided it passes testing?

Thanks,
Alex

[0] : https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html

---

gcc/ChangeLog:

	PR target/96998
	* combine.c (expand_compound_operation): Tweak variable name in
	comment to match source.
	(make_extraction): Handle mult by power of two in addition to
	ashift.

gcc/testsuite/ChangeLog:

	PR target/96998
	* gcc.c-torture/compile/pr96998.c: New test.

Comments

Alex Coplan Oct. 8, 2020, 10:21 a.m. UTC | #1
Ping. The kernel is still broken on AArch64.

On 30/09/2020 11:39, Alex Coplan via Gcc-patches wrote:
> Currently, make_extraction() identifies where we can emit an ASHIFT of
> an extend in place of an extraction, but fails to make the corresponding
> canonicalization/simplification when presented with a MULT by a power of
> two. Such a representation is canonical when representing a left-shifted
> address inside a MEM.
> 
> This patch remedies this situation: after the patch, make_extraction()
> now also identifies RTXs such as:
> 
> (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n))
> 
> and rewrites this as:
> 
> (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))
> 
> instead of using a sign_extract.
> 
> (This patch also fixes up a comment in expand_compound_operation() which
> appears to have suffered from bitrot.)
> 
> This fixes PR96998: an ICE on AArch64 due to an unrecognised
> sign_extract insn which was exposed by
> r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf. That change
> introduced a canonicalisation in LRA to rewrite mult to shift in address
> reloads.
> 
> Prior to this patch, the flow was as follows. We start with the
> following insn going into combine:
> 
> (insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (reg:DI 98 [ g ])
>                     (const_int 4 [0x4]))
>                 (reg/f:DI 96)) [3 *i_5+0 S4 A32])
>         (asm_operands:SI ("") ("=Q") 0 []
>              []
>              [] test.c:11)) "test.c":11:5 -1
>      (expr_list:REG_DEAD (reg:DI 98 [ g ])
>         (nil)))
> 
> Then combine turns this into a sign_extract:
> 
> (insn 9 8 10 3 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
>                         (const_int 4 [0x4]))
>                     (const_int 34 [0x22])
>                     (const_int 0 [0]))
>                 (reg/f:DI 96)) [3 *i_5+0 S4 A32])
>         (asm_operands:SI ("") ("=Q") 0 []
>              []
>              [] test.c:11)) "test.c":11:5 -1
>      (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
>         (nil)))
> 
> Then LRA reloads the address and (prior to the LRA change) we get:
> 
> (insn 32 8 9 3 (set (reg:DI 0 x0 [100])
>         (plus:DI (sign_extract:DI (mult:DI (reg:DI 0 x0 [orig:92 g ] [92])
>                     (const_int 4 [0x4]))
>                 (const_int 34 [0x22])
>                 (const_int 0 [0]))
>             (reg/f:DI 19 x19 [96]))) "test.c":11:5 283 {*add_extvdi_multp2}
>      (nil))
> (insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
>         (asm_operands:SI ("") ("=Q") 0 []
>              []
>              [] test.c:11)) "test.c":11:5 -1
>      (nil))
> 
> Now observe that insn 32 here is not canonical: firstly, we should be
> using an ASHIFT by 2 instead of a MULT by 4, since we're outside of a
> MEM. Indeed, the LRA change remedies this, and support for such insns in
> the AArch64 backend was dropped in
> r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8.
> 
> Now the reason we ICE after the LRA change here is that AArch64 has
> never supported the ASHIFT variant of this sign_extract insn. Inspecting
> the unrecognised reloaded insn confirms this:
> 
> (gdb) p debug(insn)
> (insn 33 8 34 3 (set (reg:DI 100)
>         (sign_extract:DI (ashift:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
>                 (const_int 2 [0x2]))
>             (const_int 34 [0x22])
>             (const_int 0 [0]))) "test.c":11:5 -1
>      (nil))
> 
> The thesis of this patch is that combine should _never_ be producing
> such an insn. Clearly this should be canonicalised as an extend
> operation instead (as combine already does in make_extraction() for the
> ASHIFT form). After this change to combine, we get:
> 
> (insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92 [ g ]))
>                     (const_int 4 [0x4]))
>                 (reg/f:DI 96)) [3 *i_5+0 S4 A32])
>         (asm_operands:SI ("") ("=Q") 0 []
>              []
>              [] test.c:11)) "test.c":11:5 -1
>      (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
>         (nil)))
> 
> coming out of combine, and LRA can happily reload the address:
> 
> (insn 32 8 9 3 (set (reg:DI 0 x0 [100])
>         (plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 0 x0 [orig:92 g ] [92]))
>                 (const_int 2 [0x2]))
>             (reg/f:DI 19 x19 [96]))) "test.c":11:5 245 {*add_extendsi_shft_di}
>      (nil))
> (insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32])
>         (asm_operands:SI ("") ("=Q") 0 []
>              []
>              [] test.c:11)) "test.c":11:5 -1
>      (nil))
> 
> and all is well, with nice simple and canonical RTL being used
> throughout.
> 
> Testing:
>  * Bootstrap and regtest on aarch64-linux-gnu, arm-linux-gnueabihf, and
>    x86-linux-gnu in progress.
> 
> OK for trunk (with AArch64 changes discussed here [0] as a follow-on
> patch) provided it passes testing?
> 
> Thanks,
> Alex
> 
> [0] : https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html
> 
> ---
> 
> gcc/ChangeLog:
> 
> 	PR target/96998
> 	* combine.c (expand_compound_operation): Tweak variable name in
> 	comment to match source.
> 	(make_extraction): Handle mult by power of two in addition to
> 	ashift.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/96998
> 	* gcc.c-torture/compile/pr96998.c: New test.
Segher Boessenkool Oct. 8, 2020, 8:20 p.m. UTC | #2
On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> Ping. The kernel is still broken on AArch64.

You *cannot* fix a correctness bug with a combine addition.  So please
fix the target bug first.

I haven't had time to look at your patch yet, sorry.


Segher
Alex Coplan Oct. 9, 2020, 8:38 a.m. UTC | #3
Hi Segher,

On 08/10/2020 15:20, Segher Boessenkool wrote:
> On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> > Ping. The kernel is still broken on AArch64.
> 
> You *cannot* fix a correctness bug with a combine addition.

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
explains why we do precisely that.

Also, as your own testing confirmed, the patch does indeed fix the issue.

> So please fix the target bug first.

I think the problem here -- the reason that we're talking past each
other -- is that there are (at least) two parts of the codebase that can
be blamed for the ICE here:

1. aarch64: "The insn is unrecognized, so it's a backend bug
(missing pattern or similar)."

2. combine: "Combine produces a non-canonical insn, so the backend
(correctly) doesn't recognise it, and combine is at fault."

Now I initially (naively) took interpretation 1 here and tried to fix
the ICE by adding a pattern to recognise the sign_extract insn that
combine is producing here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html

Howerver, in the review of that patch, Richard opened my eyes to
interpretation 2, which in hindsight is clearly a better way to fix the
issue.

Combine already does the canonicalisation for the (ashift x n) case, so
it seems like an obvious improvement to do the same for the (mult x 2^n)
case, as this is how shifts are represented inside mem rtxes.

Again, please see Richard's comments here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html

> 
> I haven't had time to look at your patch yet, sorry.

Not to worry. Hopefully this clears up any confusion around what we're
trying to do here and why.

Thanks,
Alex
Segher Boessenkool Oct. 9, 2020, 11:02 p.m. UTC | #4
On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote:
> Hi Segher,
> 
> On 08/10/2020 15:20, Segher Boessenkool wrote:
> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> > > Ping. The kernel is still broken on AArch64.
> > 
> > You *cannot* fix a correctness bug with a combine addition.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
> explains why we do precisely that.

And it still is wrong.

> Also, as your own testing confirmed, the patch does indeed fix the issue.

No, it did not.  It showed that before the patch the bug was hit, and
after it it was not.  It does not show the bug was solved.

> > So please fix the target bug first.
> 
> I think the problem here -- the reason that we're talking past each
> other -- is that there are (at least) two parts of the codebase that can
> be blamed for the ICE here:
> 
> 1. aarch64: "The insn is unrecognized, so it's a backend bug
> (missing pattern or similar)."
> 
> 2. combine: "Combine produces a non-canonical insn, so the backend
> (correctly) doesn't recognise it, and combine is at fault."

That cannot cause ICEs!  *That* is the issue.  It is *normal* for insns
combine comes up with to not be allowed by the backend (or recognised
even); in that case, that instruction combination is simply not made.
The code was valid before, and stays valid.  That is all that combine
does: it may or may not change some instructions to semantically
equivalent instructions that the cost model thinks are better to have.

You *cannot* depend on combine to make *any* particular combination or
simplification or anything.  There are hundreds of reasons to abort a
combination.

3.  The insn is not valid for the target, so the target does *not*
recognise it, and combine does *not* make that combination, *and all is
good*.

That is how it is supposed to work (and how it *does* work).

If the input to combine contains invalid instructions, you have to fix a
bug in some earlier pass (or the backend).  All instructions in the
instruction stream have to be valid (some time shortly after expand, and
passes can do crazy stuff internally, so let's say "between passes").

> Now I initially (naively) took interpretation 1 here and tried to fix
> the ICE by adding a pattern to recognise the sign_extract insn that
> combine is producing here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html
> 
> Howerver, in the review of that patch, Richard opened my eyes to
> interpretation 2, which in hindsight is clearly a better way to fix the
> issue.
> 
> Combine already does the canonicalisation for the (ashift x n) case, so
> it seems like an obvious improvement to do the same for the (mult x 2^n)
> case, as this is how shifts are represented inside mem rtxes.

An improvement perhaps, but there is still a bug in the backend.  It is
*normal* for combine to create code invalid for the target, including
non-canonical code.  You should not recognise non-canonical code if you
cannot actually handle it correctly.

> Again, please see Richard's comments here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html
> 
> > 
> > I haven't had time to look at your patch yet, sorry.
> 
> Not to worry. Hopefully this clears up any confusion around what we're
> trying to do here and why.

It doesn't, unfortunately.  But maybe you understand now?


Segher
Richard Sandiford Oct. 12, 2020, 4:19 p.m. UTC | #5
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote:
>> Hi Segher,
>> 
>> On 08/10/2020 15:20, Segher Boessenkool wrote:
>> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
>> > > Ping. The kernel is still broken on AArch64.
>> > 
>> > You *cannot* fix a correctness bug with a combine addition.
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
>> explains why we do precisely that.
>
> And it still is wrong.
>
>> Also, as your own testing confirmed, the patch does indeed fix the issue.
>
> No, it did not.  It showed that before the patch the bug was hit, and
> after it it was not.  It does not show the bug was solved.

I agree there's a target bug here.  Please see the explanation I posted
in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html
(especially the first sentence quoted below :-)).

    The situation as things stand is that aarch64 has a bug: it accepts
    an odd sign_extract representation of addresses, but doesn't accept
    that same odd form of address as an LEA.  We have two options:

    (a) add back instructions that recognise the odd form of LEA, or
    (b) remove the code that accepts the odd addresses

    I think (b) is the way to go here.  But doing that on its own
    would regress code quality.  The reason we recognised the odd
    addresses in the first place was because that was the rtl that
    combine happened to generate for an important case.

So if we go for (b) but fix the aarch64 bug strictly before the
combine patch, we would need to:

(1) Apply the target fix and adjust the testsuite markup to make sure
    that the git commit doesn't regress anyone's test results.

(2) Apply the combine patch and revert the testsuite markup changes
    from (1).

That seems like make-work, and would still show as a blip for
people doing performance tracking.

If you prefer, we could fix the aarch64 bug and patch combine as a
single commit.  See:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html

for the full patch, including the aarch64 bugfix.

Thanks,
Richard
Segher Boessenkool Oct. 12, 2020, 5:14 p.m. UTC | #6
On Mon, Oct 12, 2020 at 05:19:58PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote:
> >> Hi Segher,
> >> 
> >> On 08/10/2020 15:20, Segher Boessenkool wrote:
> >> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> >> > > Ping. The kernel is still broken on AArch64.
> >> > 
> >> > You *cannot* fix a correctness bug with a combine addition.
> >> 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
> >> explains why we do precisely that.
> >
> > And it still is wrong.
> >
> >> Also, as your own testing confirmed, the patch does indeed fix the issue.
> >
> > No, it did not.  It showed that before the patch the bug was hit, and
> > after it it was not.  It does not show the bug was solved.
> 
> I agree there's a target bug here.  Please see the explanation I posted
> in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html
> (especially the first sentence quoted below :-)).
> 
>     The situation as things stand is that aarch64 has a bug: it accepts
>     an odd sign_extract representation of addresses, but doesn't accept
>     that same odd form of address as an LEA.  We have two options:
> 
>     (a) add back instructions that recognise the odd form of LEA, or
>     (b) remove the code that accepts the odd addresses
> 
>     I think (b) is the way to go here.

Either seems to be fine.

>     But doing that on its own
>     would regress code quality.  The reason we recognised the odd
>     addresses in the first place was because that was the rtl that
>     combine happened to generate for an important case.
> 
> So if we go for (b) but fix the aarch64 bug strictly before the
> combine patch, we would need to:

This is necessary to be able to evaluate what such a combine patch
does in practice -- so there is no other way.

> (1) Apply the target fix and adjust the testsuite markup to make sure
>     that the git commit doesn't regress anyone's test results.

It is normal to regress the testsuite for a little while.

> (2) Apply the combine patch and revert the testsuite markup changes
>     from (1).
> 
> That seems like make-work, and would still show as a blip for
> people doing performance tracking.

Yes, that is make-work.  Just regress the testsuite.

You do not even have to apply the target patch first (but you need to
send it as separate patch, so that other people can test it!)

> If you prefer, we could fix the aarch64 bug and patch combine as a
> single commit.  See:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html
> 
> for the full patch, including the aarch64 bugfix.

I need separate patches, so that I can see what the current combine
does, without ICEing all over.  That is all.  Send it as a series of two
patches, or something.


Segher
Alex Coplan Oct. 15, 2020, 9:14 a.m. UTC | #7
Hi Segher,

On 12/10/2020 12:14, Segher Boessenkool wrote:
> On Mon, Oct 12, 2020 at 05:19:58PM +0100, Richard Sandiford wrote:
> > 
> > I agree there's a target bug here.  Please see the explanation I posted
> > in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html
> > (especially the first sentence quoted below :-)).
> > 
> >     The situation as things stand is that aarch64 has a bug: it accepts
> >     an odd sign_extract representation of addresses, but doesn't accept
> >     that same odd form of address as an LEA.  We have two options:
> > 
> >     (a) add back instructions that recognise the odd form of LEA, or
> >     (b) remove the code that accepts the odd addresses
> > 
> >     I think (b) is the way to go here.
> 
> Either seems to be fine.
> 
> >     But doing that on its own
> >     would regress code quality.  The reason we recognised the odd
> >     addresses in the first place was because that was the rtl that
> >     combine happened to generate for an important case.
> > 
> > So if we go for (b) but fix the aarch64 bug strictly before the
> > combine patch, we would need to:
> 
> This is necessary to be able to evaluate what such a combine patch
> does in practice -- so there is no other way.
> 
> > (1) Apply the target fix and adjust the testsuite markup to make sure
> >     that the git commit doesn't regress anyone's test results.
> 
> It is normal to regress the testsuite for a little while.
> 
> > (2) Apply the combine patch and revert the testsuite markup changes
> >     from (1).
> > 
> > That seems like make-work, and would still show as a blip for
> > people doing performance tracking.
> 
> Yes, that is make-work.  Just regress the testsuite.
> 
> You do not even have to apply the target patch first (but you need to
> send it as separate patch, so that other people can test it!)
> 
> > If you prefer, we could fix the aarch64 bug and patch combine as a
> > single commit.  See:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html
> > 
> > for the full patch, including the aarch64 bugfix.
> 
> I need separate patches, so that I can see what the current combine
> does, without ICEing all over.  That is all.  Send it as a series of two
> patches, or something.

Apologies for the misunderstanding and my inaccurate summary in the previous
email, there is of course a target bug here as well.

I've sent the patches as a two-patch series where the first patch fixes the
AArch64 bug and the second restores code quality with the patch to combine:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556237.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556238.html

Thanks,
Alex
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index c88382efbd3..fe8eff2b464 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@  expand_compound_operation (rtx x)
     }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
-     shift is a left shift of BITSIZE - POS - LEN bits.  The outer
-     shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+     shift is a left shift of MODEWIDTH - POS - LEN bits.  The outer
+     shift is a right shift of MODEWIDTH - LEN bits.  It is arithmetic or
      logical depending on the value of UNSIGNEDP.
 
      If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
@@ -7650,20 +7650,27 @@  make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
 	is_mode = GET_MODE (SUBREG_REG (inner));
       inner = SUBREG_REG (inner);
     }
-  else if (GET_CODE (inner) == ASHIFT
+  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
 	   && CONST_INT_P (XEXP (inner, 1))
-	   && pos_rtx == 0 && pos == 0
-	   && len > UINTVAL (XEXP (inner, 1)))
-    {
-      /* We're extracting the least significant bits of an rtx
-	 (ashift X (const_int C)), where LEN > C.  Extract the
-	 least significant (LEN - C) bits of X, giving an rtx
-	 whose mode is MODE, then shift it left C times.  */
-      new_rtx = make_extraction (mode, XEXP (inner, 0),
-			     0, 0, len - INTVAL (XEXP (inner, 1)),
-			     unsignedp, in_dest, in_compare);
-      if (new_rtx != 0)
-	return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
+	   && pos_rtx == 0 && pos == 0)
+    {
+      const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+      const auto code = GET_CODE (inner);
+      const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci;
+
+      if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt)
+	{
+	  /* We're extracting the least significant bits of an rtx
+	     (ashift X (const_int C)) or (mult X (const_int (2^C))),
+	     where LEN > C.  Extract the least significant (LEN - C) bits
+	     of X, giving an rtx whose mode is MODE, then shift it left
+	     C times.  */
+	  new_rtx = make_extraction (mode, XEXP (inner, 0),
+				 0, 0, len - shift_amt,
+				 unsignedp, in_dest, in_compare);
+	  if (new_rtx)
+	    return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1));
+	}
     }
   else if (GET_CODE (inner) == TRUNCATE
 	   /* If trying or potentionally trying to extract
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr96998.c b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
new file mode 100644
index 00000000000..a75d5dcfe08
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr96998.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target arm*-*-* aarch64*-*-* } } */
+
+int h(void);
+struct c d;
+struct c {
+  int e[1];
+};
+
+void f(void) {
+  int g;
+  for (;; g = h()) {
+    int *i = &d.e[g];
+    asm("" : "=Q"(*i));
+  }
+}