Message ID | 7794A52CE4D579448B959EED7DD0A4723DCE9F34@satlexdag06.amd.com |
---|---|
State | New |
Headers | show |
Hello Venkat, On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote: > diff --git a/gcc/combine.c b/gcc/combine.c > index 5c763b4..945abdb 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) > but once inside, go back to our default of SET. */ > > next_code = (code == MEM ? MEM > - : ((code == PLUS || code == MINUS) > - && SCALAR_INT_MODE_P (mode)) ? MEM > : ((code == COMPARE || COMPARISON_P (x)) > && XEXP (x, 1) == const0_rtx) ? COMPARE > : in_code == COMPARE ? SET : in_code); > > > On X86_64, it passes bootstrap and regression tests. > But on Aarch64 the test in PR passed, but I got a few test case failures. > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated. > Now those patterns have to be fixed to use SHIFTS. Also need to see any impact on other targets. Right. It would be good if you could find out for what targets it matters. The thing is, if a target expects only the patterns as combine used to make them, it will regress (as you've seen on aarch64); but it will regress _silently_. Which isn't so nice. > But before that I wanted to check if the assumption in combiner, can simply be removed ? Seeing for what targets / patterns it makes a difference would tell us the answer to that, too :-) I'll run some tests with and without your patch. Segher
On 04/29/2015 03:25 AM, Kumar, Venkataramanan wrote: > Hi Jeff/Segher, > > Restarting the discussion on the GCC combiner assumption about Memory/address type. > Ref: https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01298.html > https://gcc.gnu.org/ml/gcc/2015-04/msg00028.html > Regards, > Venkat. > > PS: I am starting a new thread since I no more have access to Linaro ID from where I sent the earlier mail. Funny, I sent mail just a few days ago trying to get this restarted, but it went to your Linaro address. I'll forward it separately. Jeff
On 04/29/2015 11:03 AM, Segher Boessenkool wrote: > > Right. It would be good if you could find out for what targets it matters. > The thing is, if a target expects only the patterns as combine used to make > them, it will regress (as you've seen on aarch64); but it will regress > _silently_. Which isn't so nice. > >> But before that I wanted to check if the assumption in combiner, can simply be removed ? > > Seeing for what targets / patterns it makes a difference would tell us the > answer to that, too :-) Right. ANd that was one of the two general directions I recommended earlier this week ;-) 1. Figure out if this code still matters at all. 2. If the code still matters, accurately track if we're inside a MEM so that things canonicalize correctly. jeff
On 04/29/2015 03:25 AM, Kumar, Venkataramanan wrote: > Hi Jeff/Segher, > > When we see an RTX code with PLUS or MINUS then it is treated as MEM/address type (we are inside address RTX). > Is there any significance on that assumption? I removed this assumption and the test case in the PR 63949 passed. When appearing inside a MEM, we have different canonicalization rules. The comment in make_compound_operation clearly indicates that the PLUS/MINUS support is a hack. However, I'm pretty sure it was strictly for getting better code than for correctness. One path forward is to properly track if we're in a MEM, at which point the hack for PLUS/MINUS could probably just go away. > > diff --git a/gcc/combine.c b/gcc/combine.c > index 5c763b4..945abdb 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) > but once inside, go back to our default of SET. */ > > next_code = (code == MEM ? MEM > - : ((code == PLUS || code == MINUS) > - && SCALAR_INT_MODE_P (mode)) ? MEM > : ((code == COMPARE || COMPARISON_P (x)) > && XEXP (x, 1) == const0_rtx) ? COMPARE > : in_code == COMPARE ? SET : in_code); > > > On X86_64, it passes bootstrap and regression tests. > But on Aarch64 the test in PR passed, but I got a few test case failures. > > Tests that now fail, but worked before: > > gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3 > gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3 So that test seems to want to verify that you can generate a shift-add type instruction. I suspect the others are similar. I'd be curious to see the .combine dump as well as the final assembly for this test. Which is a strong hint that we should be looking at target with shift-add style instructions. ARM, AArch64, HPPA, x86 come immediately to mind. >> > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated. > Now those patterns have to be fixed to use SHIFTS. Also need to see any impact on other targets. > > But before that I wanted to check if the assumption in combiner, can simply be removed ? Given what I'm seeing now, I doubt it can simply be removed at this point (which is a change in my position) since ports with these shift-add style instructions have probably been tuned to work with existing combine behaviour. We may need to do a deep test across various targets to identify those affected and fix them. jeff
On Wed, Apr 29, 2015 at 12:03:35PM -0500, Segher Boessenkool wrote: > On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote: > > diff --git a/gcc/combine.c b/gcc/combine.c > > index 5c763b4..945abdb 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) > > but once inside, go back to our default of SET. */ > > > > next_code = (code == MEM ? MEM > > - : ((code == PLUS || code == MINUS) > > - && SCALAR_INT_MODE_P (mode)) ? MEM > > : ((code == COMPARE || COMPARISON_P (x)) > > && XEXP (x, 1) == const0_rtx) ? COMPARE > > : in_code == COMPARE ? SET : in_code); > > > > > > On X86_64, it passes bootstrap and regression tests. > > But on Aarch64 the test in PR passed, but I got a few test case failures. > > > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated. > > Now those patterns have to be fixed to use SHIFTS. Also need to see any impact on other targets. > > Right. It would be good if you could find out for what targets it matters. > The thing is, if a target expects only the patterns as combine used to make > them, it will regress (as you've seen on aarch64); but it will regress > _silently_. Which isn't so nice. > > > But before that I wanted to check if the assumption in combiner, can simply be removed ? > > Seeing for what targets / patterns it makes a difference would tell us the > answer to that, too :-) > > I'll run some tests with and without your patch. So I ran the tests. These are text sizes for vmlinux built for most architectures (before resp. after the patch). Results are good, but they show many targets need some work... BIGGER 5410496 5432744 alpha 3848987 3849143 arm 2190276 2190292 blackfin 2188431 2191503 c6x (but data shrank by more than text grew) 2212322 2212706 cris 10896454 10896965 i386 7471891 7488771 parisc 6168799 6195391 parisc64 1545840 1545872 shnommu 1946649 1954149 xtensa I looked at some of those. Alpha regresses with s4add things, arm with add ,lsl things, parisc with shladd things. I tried to fix the parisc one (it seemed simplest), and the 32-bit kernel got most (but not all) of the size difference back; and the 64-bit wouldn't build (an assert in the kernel failed, and it uses a floating point reg where an integer one is needed -- I gave up). SMALLER 8688171 8688003 powerpc 20252605 20251797 powerpc64 11425334 11422558 s390 12300135 12299767 x86_64 The PowerPC differences are mostly what first was rlwinm;addi;rlwinm;lwz and now is rlwinm;lwz; i.e. the add is moved after the two rotates, the rotates are merged, and the add is made part of the offset in the load. NO DIFF 3321492 3321492 frv 3252747 3252747 m32r 4708232 4708232 microblaze 3949145 3949145 mips 5693019 5693019 mips64 2373441 2373441 mn10300 6489846 6489846 sh64 3733749 3733749 sparc 6075122 6075122 sparc64 Also quite a few without any diff. Good :-) Some of those might have unnecessary add/mult patterns now, but they also have the add/shift. I didn't see any big differences, and all are of the expected kind. Some targets need some love (but what else is new). The patch is approved for trunk. Thanks, Segher
On Fri, 1 May 2015, Segher Boessenkool wrote: > On Wed, Apr 29, 2015 at 12:03:35PM -0500, Segher Boessenkool wrote: > > On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote: > > > diff --git a/gcc/combine.c b/gcc/combine.c > > > index 5c763b4..945abdb 100644 > > > --- a/gcc/combine.c > > > +++ b/gcc/combine.c > > > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) > > > but once inside, go back to our default of SET. */ > > > > > > next_code = (code == MEM ? MEM > > > - : ((code == PLUS || code == MINUS) > > > - && SCALAR_INT_MODE_P (mode)) ? MEM > > > : ((code == COMPARE || COMPARISON_P (x)) > > > && XEXP (x, 1) == const0_rtx) ? COMPARE > > > : in_code == COMPARE ? SET : in_code); > > > > > > > > > On X86_64, it passes bootstrap and regression tests. > > > But on Aarch64 the test in PR passed, but I got a few test case failures. > > > > > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated. > > > Now those patterns have to be fixed to use SHIFTS. Also need to see any impact on other targets. > > > > Right. It would be good if you could find out for what targets it matters. > > The thing is, if a target expects only the patterns as combine used to make > > them, it will regress (as you've seen on aarch64); but it will regress > > _silently_. Which isn't so nice. > > > > > But before that I wanted to check if the assumption in combiner, can simply be removed ? > > > > Seeing for what targets / patterns it makes a difference would tell us the > > answer to that, too :-) > > > > I'll run some tests with and without your patch. > > So I ran the tests. These are text sizes for vmlinux built for most > architectures (before resp. after the patch). Thanks for actually checking the impact. > Results are good, but > they show many targets need some work... > > > BIGGER > 2212322 2212706 cris Also observable as noted in PR66171, a regression: Running /tmp/hpautotest-gcc0/gcc/gcc/testsuite/gcc.target/cris/cris.exp ... FAIL: gcc.target/cris/biap.c scan-assembler addi FAIL: gcc.target/cris/biap.c scan-assembler-not lsl I confess the test-case-"guarded" addi pattern should have been expressed with a shift in addition to the multiplication. ("In addition to" as the canonically wrong one used to be the combine-matching pattern; I'm not sure I should really drop that just yet.) I'll have to check that expressing using a shift fixes this issue. Supposedly more noteworthy: this now-stricter canonicalization leads to a requirement to rewrite patterns that used to be: (parallel [(set reg0 (mem (plus (mult reg1 N) reg2))) (set reg3 (plus (mult reg1 N) reg2))]) into the awkwardly asymmetric: (parallel [(set reg0 (mem (plus (mult reg1 2) reg2))) (set reg3 (plus (ashift reg1 M) reg2))]) (where M = log2 N) and of course a need to verify that combine actually *does* make use of the pattern after the change at least as much as it did before. > Some targets need some love (but what else is new). Meh. Well, you now got some whine to go with that cheese. brgds, H-P
On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote: > I confess the test-case-"guarded" addi pattern should have been > expressed with a shift in addition to the multiplication. But they wouldn't ever match so they might very well have bitrotted by now :-( > ("In > addition to" as the canonically wrong one used to be the > combine-matching pattern; I'm not sure I should really drop that > just yet.) It is harmless to leave it in. It will rot though, eventually -- better take it out before then. Add some gcc_unreachable, perhaps. > Supposedly more noteworthy: this now-stricter canonicalization > leads to a requirement to rewrite patterns that used to be: > > (parallel > [(set reg0 (mem (plus (mult reg1 N) reg2))) > (set reg3 (plus (mult reg1 N) reg2))]) > > into the awkwardly asymmetric: > > (parallel > [(set reg0 (mem (plus (mult reg1 2) reg2))) > (set reg3 (plus (ashift reg1 M) reg2))]) > > (where M = log2 N) Yeah. This is true of parallels in general: canonicalisation works on each arm separately. I'm not sure what can be done about it. Looks like quite some work for you, I'm sorry about that, Segher
On Sat, 16 May 2015, Segher Boessenkool wrote: > On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote: > > I confess the test-case-"guarded" addi pattern should have been > > expressed with a shift in addition to the multiplication. > > But they wouldn't ever match so they might very well have bitrotted > by now :-( It seems you're saying that the canonicalization to "ashift" didn't work *at all*, when starting with an expression from an address? I knew it failed in part, but always thought it was just a partial failure. > > ("In > > addition to" as the canonically wrong one used to be the > > combine-matching pattern; I'm not sure I should really drop that > > just yet.) > > It is harmless to leave it in. It will rot though, eventually -- > better take it out before then. Add some gcc_unreachable, perhaps. I've been trying to come up with valid reasons there'd be ever be canonicalization by multiplication, but failed so I guess I'll rip it out. > > Supposedly more noteworthy: this now-stricter canonicalization > > leads to a requirement to rewrite patterns that used to be: > > > > (parallel > > [(set reg0 (mem (plus (mult reg1 N) reg2))) > > (set reg3 (plus (mult reg1 N) reg2))]) > > > > into the awkwardly asymmetric: > > > > (parallel > > [(set reg0 (mem (plus (mult reg1 2) reg2))) > > (set reg3 (plus (ashift reg1 M) reg2))]) > > > > (where M = log2 N) > > Yeah. This is true of parallels in general: canonicalisation works > on each arm separately. I'm not sure what can be done about it. > > > Looks like quite some work for you, I'm sorry about that, It's almost over, at least the editing part... brgds, H-P
On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote: > On Sat, 16 May 2015, Segher Boessenkool wrote: > > On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote: > > > I confess the test-case-"guarded" addi pattern should have been > > > expressed with a shift in addition to the multiplication. > > > > But they wouldn't ever match so they might very well have bitrotted > > by now :-( > > It seems you're saying that the canonicalization to "ashift" > didn't work *at all*, when starting with an expression from an > address? I knew it failed in part, but always thought it was > just a partial failure. With a plus or minus combine would always write it as a mult. I don't think any other pass would create this combination. I haven't tested it though. > > > ("In > > > addition to" as the canonically wrong one used to be the > > > combine-matching pattern; I'm not sure I should really drop that > > > just yet.) > > > > It is harmless to leave it in. It will rot though, eventually -- > > better take it out before then. Add some gcc_unreachable, perhaps. > > I've been trying to come up with valid reasons there'd be ever > be canonicalization by multiplication, but failed so I guess > I'll rip it out. The "unreachable" thing should quickly tell you if that guess is wrong. Not something you want to leave in a production compiler, of course. > > Looks like quite some work for you, I'm sorry about that, > > It's almost over, at least the editing part... Great to hear that! Segher
On Sat, 16 May 2015, Segher Boessenkool wrote: > On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote: > > On Sat, 16 May 2015, Segher Boessenkool wrote: > > > On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote: > > > > I confess the test-case-"guarded" addi pattern should have been > > > > expressed with a shift in addition to the multiplication. > > > > > > But they wouldn't ever match so they might very well have bitrotted > > > by now :-( > > > > It seems you're saying that the canonicalization to "ashift" > > didn't work *at all*, when starting with an expression from an > > address? I knew it failed in part, but always thought it was > > just a partial failure. > > With a plus or minus combine would always write it as a mult. > I don't think any other pass would create this combination. I > haven't tested it though. Ports probably also generate that internally at various RTL passes, something that takes a bit more than an at-a-glance code inspection. > > > > ("In > > > > addition to" as the canonically wrong one used to be the > > > > combine-matching pattern; I'm not sure I should really drop that > > > > just yet.) > > > > > > It is harmless to leave it in. It will rot though, eventually -- > > > better take it out before then. Add some gcc_unreachable, perhaps. > > > > I've been trying to come up with valid reasons there'd be ever > > be canonicalization by multiplication, but failed so I guess > > I'll rip it out. > > The "unreachable" thing should quickly tell you if that guess is wrong. > Not something you want to leave in a production compiler, of course. I think you misunderstood; I mean that I pondered "philosophical" reasons to change the canonical representation; if there was some reasoning, current or looking forward, where we'd "add back" mult as non-address-canonical RTL. For the current scheme, keeping the matching-patterns but with assertions added might be helpful, yes. (People looking for practical clues: for spotting dead target code there are besides the alluded-to gcc_unreachable(), also the options to call internal_error() with a friendly string or generating invalid assembly with a descriptive mnemonic.) Actually, there are two things you could help with: - (pointer-to) step-by-step instructions to recreate the Linux (kernel :) build, as those you did before for a multiple of targets. I'd like to know I fixed your observations. - add a preferred canonicalization function to do conversion to/from memory-address-canonical RTL. Like fwprop.c:canonicalize_address (just not static :) and maybe also a canonicalize_nonaddress. At the moment, ports call replace_equiv_address (family of functions) when changing address RTL, but that code-path (at a glance) doesn't canonicalize whatever non-address-canonical RTL you throw at it. Maybe it should? > > > Looks like quite some work for you, I'm sorry about that, > > > > It's almost over, at least the editing part... > > Great to hear that! Thanks for your enthusiasm! Well, I've known about this wart for 6+ years as seen in PR37939 (but before that likely at least as long), so I *should* thank you for fixing it; it's just that the added churn and required tweaks has somewhat of a sour taste. :) brgds, H-P
On Sat, May 16, 2015 at 09:55:08PM -0400, Hans-Peter Nilsson wrote: > > With a plus or minus combine would always write it as a mult. > > I don't think any other pass would create this combination. I > > haven't tested it though. > > Ports probably also generate that internally at various RTL > passes, something that takes a bit more than an at-a-glance code > inspection. Yeah, true. Hopefully port maintainers will know about what the port does, or at least notice the ICEs. > > > I've been trying to come up with valid reasons there'd be ever > > > be canonicalization by multiplication, but failed so I guess > > > I'll rip it out. > > > > The "unreachable" thing should quickly tell you if that guess is wrong. > > Not something you want to leave in a production compiler, of course. > > I think you misunderstood; I mean that I pondered > "philosophical" reasons to change the canonical representation; > if there was some reasoning, current or looking forward, where > we'd "add back" mult as non-address-canonical RTL. Ah. Yes, same here -- I don't see any situation where having mult instead of shift outside of a mem would be more convenient. > Actually, there are two things you could help with: > > - (pointer-to) step-by-step instructions to recreate the Linux > (kernel :) build, as those you did before for a multiple of > targets. I'd like to know I fixed your observations. I used <http://git.infradead.org/users/segher/buildall.git>; it has a README. I see that doc is a little out of date, I'll update. > - add a preferred canonicalization function to do conversion > to/from memory-address-canonical RTL. Like > fwprop.c:canonicalize_address (just not static :) and maybe also > a canonicalize_nonaddress. At the moment, ports call > replace_equiv_address (family of functions) when changing > address RTL, but that code-path (at a glance) doesn't > canonicalize whatever non-address-canonical RTL you throw at it. > Maybe it should? Maybe validize_mem is nicer? It depends much what you really want to change. Does simplify_rtx not do what you want for "nonaddress"? Segher
On Sun, 17 May 2015, Segher Boessenkool wrote: > I used <http://git.infradead.org/users/segher/buildall.git>; it has > a README. I see that doc is a little out of date, I'll update. ("git:" not "http:" for cloning) Thanks, looks useful. Hm, maybe we already mention this in the wiki... > > - add a preferred canonicalization function to do conversion > > to/from memory-address-canonical RTL. Like > > fwprop.c:canonicalize_address (just not static :) and maybe also > > a canonicalize_nonaddress. At the moment, ports call > > replace_equiv_address (family of functions) when changing > > address RTL, but that code-path (at a glance) doesn't > > canonicalize whatever non-address-canonical RTL you throw at it. > > Maybe it should? > > Maybe validize_mem is nicer? It depends much what you really > want to change. I'd had made use of a function that'd automatically rewrite an ASHIFT into a MULT. No, validize_mem is kind-of a sibling to replace_equiv_address with just the containing MEM and will just call it, says the code. A call to replace_equiv_address (& Co.) will *break out* invalid expressions like an ASHIFT - or at least is supposed to do that. But that'd be mostly for convenience of other ports; I've coped now, I think. > Does simplify_rtx not do what you want for "nonaddress"? Oh, right, that's already what we have, thanks. brgds, H-P
On 05/16/2015 07:55 PM, Hans-Peter Nilsson wrote: > On Sat, 16 May 2015, Segher Boessenkool wrote: >> On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote: >>> On Sat, 16 May 2015, Segher Boessenkool wrote: >>>> On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote: >>>>> I confess the test-case-"guarded" addi pattern should have been >>>>> expressed with a shift in addition to the multiplication. >>>> >>>> But they wouldn't ever match so they might very well have bitrotted >>>> by now :-( >>> >>> It seems you're saying that the canonicalization to "ashift" >>> didn't work *at all*, when starting with an expression from an >>> address? I knew it failed in part, but always thought it was >>> just a partial failure. >> >> With a plus or minus combine would always write it as a mult. >> I don't think any other pass would create this combination. I >> haven't tested it though. > > Ports probably also generate that internally at various RTL > passes, something that takes a bit more than an at-a-glance code > inspection. Correct. THe PA port for example has a ton of this kind of RTL rewriting to exploit the shift-add insns and scaled indexed addressing modes (and correct for some oddities in the PA chip where the scaled modes don't exist in every context where you'd think they should). And you still have to to worry about things like combine taking a (mem (plus (mult))), selecting the (plus (mult)) as a split point and failing to canonicalize it into the ashift form. I ran into that while fixing up the PA for these changes. The good news is with two trivial combine changes and the expected changes to the PA backend, I can get code generation back to where it was before across my sample testfiles. Jeff
diff --git a/gcc/combine.c b/gcc/combine.c index 5c763b4..945abdb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) but once inside, go back to our default of SET. */ next_code = (code == MEM ? MEM - : ((code == PLUS || code == MINUS) - && SCALAR_INT_MODE_P (mode)) ? MEM : ((code == COMPARE || COMPARISON_P (x)) && XEXP (x, 1) == const0_rtx) ? COMPARE : in_code == COMPARE ? SET : in_code);