Message ID | 20200930103931.raxfvhpuehrgulcp@arm.com |
---|---|
State | New |
Headers | show |
Series | [v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998] | expand |
Ping. The kernel is still broken on AArch64. On 30/09/2020 11:39, Alex Coplan via Gcc-patches wrote: > Currently, make_extraction() identifies where we can emit an ASHIFT of > an extend in place of an extraction, but fails to make the corresponding > canonicalization/simplification when presented with a MULT by a power of > two. Such a representation is canonical when representing a left-shifted > address inside a MEM. > > This patch remedies this situation: after the patch, make_extraction() > now also identifies RTXs such as: > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > and rewrites this as: > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > instead of using a sign_extract. > > (This patch also fixes up a comment in expand_compound_operation() which > appears to have suffered from bitrot.) > > This fixes PR96998: an ICE on AArch64 due to an unrecognised > sign_extract insn which was exposed by > r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf. That change > introduced a canonicalisation in LRA to rewrite mult to shift in address > reloads. > > Prior to this patch, the flow was as follows. We start with the > following insn going into combine: > > (insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (reg:DI 98 [ g ]) > (const_int 4 [0x4])) > (reg/f:DI 96)) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] test.c:11)) "test.c":11:5 -1 > (expr_list:REG_DEAD (reg:DI 98 [ g ]) > (nil))) > > Then combine turns this into a sign_extract: > > (insn 9 8 10 3 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) > (const_int 4 [0x4])) > (const_int 34 [0x22]) > (const_int 0 [0])) > (reg/f:DI 96)) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] test.c:11)) "test.c":11:5 -1 > (expr_list:REG_DEAD (reg/v:SI 92 [ g ]) > (nil))) > > Then LRA reloads the address and (prior to the LRA change) we get: > > (insn 32 8 9 3 (set (reg:DI 0 x0 [100]) > (plus:DI (sign_extract:DI (mult:DI (reg:DI 0 x0 [orig:92 g ] [92]) > (const_int 4 [0x4])) > (const_int 34 [0x22]) > (const_int 0 [0])) > (reg/f:DI 19 x19 [96]))) "test.c":11:5 283 {*add_extvdi_multp2} > (nil)) > (insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] test.c:11)) "test.c":11:5 -1 > (nil)) > > Now observe that insn 32 here is not canonical: firstly, we should be > using an ASHIFT by 2 instead of a MULT by 4, since we're outside of a > MEM. Indeed, the LRA change remedies this, and support for such insns in > the AArch64 backend was dropped in > r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8. > > Now the reason we ICE after the LRA change here is that AArch64 has > never supported the ASHIFT variant of this sign_extract insn. Inspecting > the unrecognised reloaded insn confirms this: > > (gdb) p debug(insn) > (insn 33 8 34 3 (set (reg:DI 100) > (sign_extract:DI (ashift:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) > (const_int 2 [0x2])) > (const_int 34 [0x22]) > (const_int 0 [0]))) "test.c":11:5 -1 > (nil)) > > The thesis of this patch is that combine should _never_ be producing > such an insn. Clearly this should be canonicalised as an extend > operation instead (as combine already does in make_extraction() for the > ASHIFT form). After this change to combine, we get: > > (insn 9 8 10 3 (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92 [ g ])) > (const_int 4 [0x4])) > (reg/f:DI 96)) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] test.c:11)) "test.c":11:5 -1 > (expr_list:REG_DEAD (reg/v:SI 92 [ g ]) > (nil))) > > coming out of combine, and LRA can happily reload the address: > > (insn 32 8 9 3 (set (reg:DI 0 x0 [100]) > (plus:DI (ashift:DI (sign_extend:DI (reg/v:SI 0 x0 [orig:92 g ] [92])) > (const_int 2 [0x2])) > (reg/f:DI 19 x19 [96]))) "test.c":11:5 245 {*add_extendsi_shft_di} > (nil)) > (insn 9 32 10 3 (set (mem:SI (reg:DI 0 x0 [100]) [3 *i_5+0 S4 A32]) > (asm_operands:SI ("") ("=Q") 0 [] > [] > [] test.c:11)) "test.c":11:5 -1 > (nil)) > > and all is well, with nice simple and canonical RTL being used > throughout. > > Testing: > * Bootstrap and regtest on aarch64-linux-gnu, arm-linux-gnueabihf, and > x86-linux-gnu in progress. > > OK for trunk (with AArch64 changes discussed here [0] as a follow-on > patch) provided it passes testing? > > Thanks, > Alex > > [0] : https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html > > --- > > gcc/ChangeLog: > > PR target/96998 > * combine.c (expand_compound_operation): Tweak variable name in > comment to match source. > (make_extraction): Handle mult by power of two in addition to > ashift. > > gcc/testsuite/ChangeLog: > > PR target/96998 > * gcc.c-torture/compile/pr96998.c: New test.
On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> Ping. The kernel is still broken on AArch64.
You *cannot* fix a correctness bug with a combine addition. So please
fix the target bug first.
I haven't had time to look at your patch yet, sorry.
Segher
Hi Segher, On 08/10/2020 15:20, Segher Boessenkool wrote: > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote: > > Ping. The kernel is still broken on AArch64. > > You *cannot* fix a correctness bug with a combine addition. https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html explains why we do precisely that. Also, as your own testing confirmed, the patch does indeed fix the issue. > So please fix the target bug first. I think the problem here -- the reason that we're talking past each other -- is that there are (at least) two parts of the codebase that can be blamed for the ICE here: 1. aarch64: "The insn is unrecognized, so it's a backend bug (missing pattern or similar)." 2. combine: "Combine produces a non-canonical insn, so the backend (correctly) doesn't recognise it, and combine is at fault." Now I initially (naively) took interpretation 1 here and tried to fix the ICE by adding a pattern to recognise the sign_extract insn that combine is producing here: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html Howerver, in the review of that patch, Richard opened my eyes to interpretation 2, which in hindsight is clearly a better way to fix the issue. Combine already does the canonicalisation for the (ashift x n) case, so it seems like an obvious improvement to do the same for the (mult x 2^n) case, as this is how shifts are represented inside mem rtxes. Again, please see Richard's comments here: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html > > I haven't had time to look at your patch yet, sorry. Not to worry. Hopefully this clears up any confusion around what we're trying to do here and why. Thanks, Alex
On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote: > Hi Segher, > > On 08/10/2020 15:20, Segher Boessenkool wrote: > > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote: > > > Ping. The kernel is still broken on AArch64. > > > > You *cannot* fix a correctness bug with a combine addition. > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html > explains why we do precisely that. And it still is wrong. > Also, as your own testing confirmed, the patch does indeed fix the issue. No, it did not. It showed that before the patch the bug was hit, and after it it was not. It does not show the bug was solved. > > So please fix the target bug first. > > I think the problem here -- the reason that we're talking past each > other -- is that there are (at least) two parts of the codebase that can > be blamed for the ICE here: > > 1. aarch64: "The insn is unrecognized, so it's a backend bug > (missing pattern or similar)." > > 2. combine: "Combine produces a non-canonical insn, so the backend > (correctly) doesn't recognise it, and combine is at fault." That cannot cause ICEs! *That* is the issue. It is *normal* for insns combine comes up with to not be allowed by the backend (or recognised even); in that case, that instruction combination is simply not made. The code was valid before, and stays valid. That is all that combine does: it may or may not change some instructions to semantically equivalent instructions that the cost model thinks are better to have. You *cannot* depend on combine to make *any* particular combination or simplification or anything. There are hundreds of reasons to abort a combination. 3. The insn is not valid for the target, so the target does *not* recognise it, and combine does *not* make that combination, *and all is good*. That is how it is supposed to work (and how it *does* work). If the input to combine contains invalid instructions, you have to fix a bug in some earlier pass (or the backend). All instructions in the instruction stream have to be valid (some time shortly after expand, and passes can do crazy stuff internally, so let's say "between passes"). > Now I initially (naively) took interpretation 1 here and tried to fix > the ICE by adding a pattern to recognise the sign_extract insn that > combine is producing here: > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html > > Howerver, in the review of that patch, Richard opened my eyes to > interpretation 2, which in hindsight is clearly a better way to fix the > issue. > > Combine already does the canonicalisation for the (ashift x n) case, so > it seems like an obvious improvement to do the same for the (mult x 2^n) > case, as this is how shifts are represented inside mem rtxes. An improvement perhaps, but there is still a bug in the backend. It is *normal* for combine to create code invalid for the target, including non-canonical code. You should not recognise non-canonical code if you cannot actually handle it correctly. > Again, please see Richard's comments here: > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html > > > > > I haven't had time to look at your patch yet, sorry. > > Not to worry. Hopefully this clears up any confusion around what we're > trying to do here and why. It doesn't, unfortunately. But maybe you understand now? Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote: >> Hi Segher, >> >> On 08/10/2020 15:20, Segher Boessenkool wrote: >> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote: >> > > Ping. The kernel is still broken on AArch64. >> > >> > You *cannot* fix a correctness bug with a combine addition. >> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html >> explains why we do precisely that. > > And it still is wrong. > >> Also, as your own testing confirmed, the patch does indeed fix the issue. > > No, it did not. It showed that before the patch the bug was hit, and > after it it was not. It does not show the bug was solved. I agree there's a target bug here. Please see the explanation I posted in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html (especially the first sentence quoted below :-)). The situation as things stand is that aarch64 has a bug: it accepts an odd sign_extract representation of addresses, but doesn't accept that same odd form of address as an LEA. We have two options: (a) add back instructions that recognise the odd form of LEA, or (b) remove the code that accepts the odd addresses I think (b) is the way to go here. But doing that on its own would regress code quality. The reason we recognised the odd addresses in the first place was because that was the rtl that combine happened to generate for an important case. So if we go for (b) but fix the aarch64 bug strictly before the combine patch, we would need to: (1) Apply the target fix and adjust the testsuite markup to make sure that the git commit doesn't regress anyone's test results. (2) Apply the combine patch and revert the testsuite markup changes from (1). That seems like make-work, and would still show as a blip for people doing performance tracking. If you prefer, we could fix the aarch64 bug and patch combine as a single commit. See: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html for the full patch, including the aarch64 bugfix. Thanks, Richard
On Mon, Oct 12, 2020 at 05:19:58PM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote: > >> Hi Segher, > >> > >> On 08/10/2020 15:20, Segher Boessenkool wrote: > >> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote: > >> > > Ping. The kernel is still broken on AArch64. > >> > > >> > You *cannot* fix a correctness bug with a combine addition. > >> > >> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html > >> explains why we do precisely that. > > > > And it still is wrong. > > > >> Also, as your own testing confirmed, the patch does indeed fix the issue. > > > > No, it did not. It showed that before the patch the bug was hit, and > > after it it was not. It does not show the bug was solved. > > I agree there's a target bug here. Please see the explanation I posted > in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html > (especially the first sentence quoted below :-)). > > The situation as things stand is that aarch64 has a bug: it accepts > an odd sign_extract representation of addresses, but doesn't accept > that same odd form of address as an LEA. We have two options: > > (a) add back instructions that recognise the odd form of LEA, or > (b) remove the code that accepts the odd addresses > > I think (b) is the way to go here. Either seems to be fine. > But doing that on its own > would regress code quality. The reason we recognised the odd > addresses in the first place was because that was the rtl that > combine happened to generate for an important case. > > So if we go for (b) but fix the aarch64 bug strictly before the > combine patch, we would need to: This is necessary to be able to evaluate what such a combine patch does in practice -- so there is no other way. > (1) Apply the target fix and adjust the testsuite markup to make sure > that the git commit doesn't regress anyone's test results. It is normal to regress the testsuite for a little while. > (2) Apply the combine patch and revert the testsuite markup changes > from (1). > > That seems like make-work, and would still show as a blip for > people doing performance tracking. Yes, that is make-work. Just regress the testsuite. You do not even have to apply the target patch first (but you need to send it as separate patch, so that other people can test it!) > If you prefer, we could fix the aarch64 bug and patch combine as a > single commit. See: > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html > > for the full patch, including the aarch64 bugfix. I need separate patches, so that I can see what the current combine does, without ICEing all over. That is all. Send it as a series of two patches, or something. Segher
Hi Segher, On 12/10/2020 12:14, Segher Boessenkool wrote: > On Mon, Oct 12, 2020 at 05:19:58PM +0100, Richard Sandiford wrote: > > > > I agree there's a target bug here. Please see the explanation I posted > > in: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html > > (especially the first sentence quoted below :-)). > > > > The situation as things stand is that aarch64 has a bug: it accepts > > an odd sign_extract representation of addresses, but doesn't accept > > that same odd form of address as an LEA. We have two options: > > > > (a) add back instructions that recognise the odd form of LEA, or > > (b) remove the code that accepts the odd addresses > > > > I think (b) is the way to go here. > > Either seems to be fine. > > > But doing that on its own > > would regress code quality. The reason we recognised the odd > > addresses in the first place was because that was the rtl that > > combine happened to generate for an important case. > > > > So if we go for (b) but fix the aarch64 bug strictly before the > > combine patch, we would need to: > > This is necessary to be able to evaluate what such a combine patch > does in practice -- so there is no other way. > > > (1) Apply the target fix and adjust the testsuite markup to make sure > > that the git commit doesn't regress anyone's test results. > > It is normal to regress the testsuite for a little while. > > > (2) Apply the combine patch and revert the testsuite markup changes > > from (1). > > > > That seems like make-work, and would still show as a blip for > > people doing performance tracking. > > Yes, that is make-work. Just regress the testsuite. > > You do not even have to apply the target patch first (but you need to > send it as separate patch, so that other people can test it!) > > > If you prefer, we could fix the aarch64 bug and patch combine as a > > single commit. See: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554257.html > > > > for the full patch, including the aarch64 bugfix. > > I need separate patches, so that I can see what the current combine > does, without ICEing all over. That is all. Send it as a series of two > patches, or something. Apologies for the misunderstanding and my inaccurate summary in the previous email, there is of course a target bug here as well. I've sent the patches as a two-patch series where the first patch fixes the AArch64 bug and the second restores code quality with the patch to combine: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556237.html https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556238.html Thanks, Alex
diff --git a/gcc/combine.c b/gcc/combine.c index c88382efbd3..fe8eff2b464 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) } /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) - { - /* We're extracting the least significant bits of an rtx - (ashift X (const_int C)), where LEN > C. Extract the - least significant (LEN - C) bits of X, giving an rtx - whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), - 0, 0, len - INTVAL (XEXP (inner, 1)), - unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) + { + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const auto code = GET_CODE (inner); + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) + { + /* We're extracting the least significant bits of an rtx + (ashift X (const_int C)) or (mult X (const_int (2^C))), + where LEN > C. Extract the least significant (LEN - C) bits + of X, giving an rtx whose mode is MODE, then shift it left + C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract diff --git a/gcc/testsuite/gcc.c-torture/compile/pr96998.c b/gcc/testsuite/gcc.c-torture/compile/pr96998.c new file mode 100644 index 00000000000..a75d5dcfe08 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr96998.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target arm*-*-* aarch64*-*-* } } */ + +int h(void); +struct c d; +struct c { + int e[1]; +}; + +void f(void) { + int g; + for (;; g = h()) { + int *i = &d.e[g]; + asm("" : "=Q"(*i)); + } +}