diff mbox

[RFC] : Remove Mem/address type assumption in combiner

Message ID 7794A52CE4D579448B959EED7DD0A4723DCE9F34@satlexdag06.amd.com
State New
Headers show

Commit Message

Kumar, Venkataramanan April 29, 2015, 9:25 a.m. UTC
Hi Jeff/Segher, 

Restarting the discussion on the GCC combiner assumption about Memory/address type.
Ref: https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01298.html
https://gcc.gnu.org/ml/gcc/2015-04/msg00028.html

While working on the test case in PR 63949,  I came across the below code in combine.c: make_compound_operation.

--snip---
  /* Select the code to be used in recursive calls.  Once we are inside an
     address, we stay there.  If we have a comparison, set to COMPARE,
     but once inside, go back to our default of SET.  */

  next_code = (code == MEM ? MEM
               : ((code == PLUS || code == MINUS)
                  && SCALAR_INT_MODE_P (mode)) ? MEM
               : ((code == COMPARE || COMPARISON_P (x))
                  && XEXP (x, 1) == const0_rtx) ? COMPARE
               : in_code == COMPARE ? SET : in_code);
---snip--

When we see an  RTX code with PLUS or MINUS then it is treated as  MEM/address type (we are inside address RTX). 
Is there any significance on that assumption?  I removed this assumption and the test case in the PR 63949 passed.



On X86_64, it passes bootstrap and regression tests.
But on Aarch64 the test in PR passed, but I got a few test case failures.

Tests that now fail, but worked before:

gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, x[0-9]+,
 sxtw 2
gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, x[0-9]+,
 sxtw 2

Based on  in_code , If it is of "MEM"  type combiner converts shifts to multiply operations.

--snip--
  switch (code)
    {
    case ASHIFT:
      /* Convert shifts by constants into multiplications if inside
         an address.  */
      if (in_code == MEM && CONST_INT_P (XEXP (x, 1))
          && INTVAL (XEXP (x, 1)) < HOST_BITS_PER_WIDE_INT
          && INTVAL (XEXP (x, 1)) >= 0
          && SCALAR_INT_MODE_P (mode))
 ---snip----

There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated.
Now those patterns have to be fixed to use SHIFTS.  Also need to see any impact on other targets.

But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?

Regards,
Venkat.

PS:  I am starting a new thread since I no more have access to Linaro ID from where I sent the earlier mail.

Comments

Segher Boessenkool April 29, 2015, 5:03 p.m. UTC | #1
Hello Venkat,

On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote:
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 5c763b4..945abdb 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
>       but once inside, go back to our default of SET.  */
> 
>    next_code = (code == MEM ? MEM
> -              : ((code == PLUS || code == MINUS)
> -                 && SCALAR_INT_MODE_P (mode)) ? MEM
>                : ((code == COMPARE || COMPARISON_P (x))
>                   && XEXP (x, 1) == const0_rtx) ? COMPARE
>                : in_code == COMPARE ? SET : in_code);
> 
> 
> On X86_64, it passes bootstrap and regression tests.
> But on Aarch64 the test in PR passed, but I got a few test case failures.

> There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated.
> Now those patterns have to be fixed to use SHIFTS.  Also need to see any impact on other targets.

Right.  It would be good if you could find out for what targets it matters.
The thing is, if a target expects only the patterns as combine used to make
them, it will regress (as you've seen on aarch64); but it will regress
_silently_.  Which isn't so nice.

> But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?

Seeing for what targets / patterns it makes a difference would tell us the
answer to that, too :-)

I'll run some tests with and without your patch.


Segher
Jeff Law April 29, 2015, 7:08 p.m. UTC | #2
On 04/29/2015 03:25 AM, Kumar, Venkataramanan wrote:
> Hi Jeff/Segher,
>
> Restarting the discussion on the GCC combiner assumption about Memory/address type.
> Ref: https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01298.html
> https://gcc.gnu.org/ml/gcc/2015-04/msg00028.html

> Regards,
> Venkat.
>
> PS:  I am starting a new thread since I no more have access to Linaro ID from where I sent the earlier mail.
Funny, I sent mail just a few days ago trying to get this restarted, but 
it went to your Linaro address.  I'll forward it separately.


Jeff
Jeff Law April 29, 2015, 7:10 p.m. UTC | #3
On 04/29/2015 11:03 AM, Segher Boessenkool wrote:
>
> Right.  It would be good if you could find out for what targets it matters.
> The thing is, if a target expects only the patterns as combine used to make
> them, it will regress (as you've seen on aarch64); but it will regress
> _silently_.  Which isn't so nice.
>
>> But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?
>
> Seeing for what targets / patterns it makes a difference would tell us the
> answer to that, too :-)
Right.  ANd that was one of the two general directions I recommended 
earlier this week ;-)

1. Figure out if this code still matters at all.
2. If the code still matters, accurately track if we're
    inside a MEM so that things canonicalize correctly.

jeff
Jeff Law April 29, 2015, 7:22 p.m. UTC | #4
On 04/29/2015 03:25 AM, Kumar, Venkataramanan wrote:
> Hi Jeff/Segher,
>
> When we see an  RTX code with PLUS or MINUS then it is treated as  MEM/address type (we are inside address RTX).
> Is there any significance on that assumption?  I removed this assumption and the test case in the PR 63949 passed.
When appearing inside a MEM, we have different canonicalization rules. 
The comment in make_compound_operation clearly indicates that the 
PLUS/MINUS support is a hack.  However, I'm pretty sure it was strictly 
for getting better code than for correctness.

One path forward is to properly track if we're in a MEM, at which point 
the hack for PLUS/MINUS could probably just go away.


>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 5c763b4..945abdb 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
>        but once inside, go back to our default of SET.  */
>
>     next_code = (code == MEM ? MEM
> -              : ((code == PLUS || code == MINUS)
> -                 && SCALAR_INT_MODE_P (mode)) ? MEM
>                 : ((code == COMPARE || COMPARISON_P (x))
>                    && XEXP (x, 1) == const0_rtx) ? COMPARE
>                 : in_code == COMPARE ? SET : in_code);
>
>
> On X86_64, it passes bootstrap and regression tests.
> But on Aarch64 the test in PR passed, but I got a few test case failures.
>
> Tests that now fail, but worked before:
>
> gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
> gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
So that test seems to want to verify that you can generate a shift-add 
type instruction.  I suspect the others are similar.  I'd be curious to 
see the .combine dump as well as the final assembly for this test.


Which is a strong hint that we should be looking at target with 
shift-add style instructions.  ARM, AArch64, HPPA, x86 come immediately 
to mind.


>>
> There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated.
> Now those patterns have to be fixed to use SHIFTS.  Also need to see any impact on other targets.
>
> But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?
Given what I'm seeing now, I doubt it can simply be removed at this 
point (which is a change in my position) since ports with these 
shift-add style instructions have probably been tuned to work with 
existing combine behaviour.  We may need to do a deep test across 
various targets to identify those affected and fix them.

jeff
Segher Boessenkool May 1, 2015, 3:17 p.m. UTC | #5
On Wed, Apr 29, 2015 at 12:03:35PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote:
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 5c763b4..945abdb 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
> >       but once inside, go back to our default of SET.  */
> > 
> >    next_code = (code == MEM ? MEM
> > -              : ((code == PLUS || code == MINUS)
> > -                 && SCALAR_INT_MODE_P (mode)) ? MEM
> >                : ((code == COMPARE || COMPARISON_P (x))
> >                   && XEXP (x, 1) == const0_rtx) ? COMPARE
> >                : in_code == COMPARE ? SET : in_code);
> > 
> > 
> > On X86_64, it passes bootstrap and regression tests.
> > But on Aarch64 the test in PR passed, but I got a few test case failures.
> 
> > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated.
> > Now those patterns have to be fixed to use SHIFTS.  Also need to see any impact on other targets.
> 
> Right.  It would be good if you could find out for what targets it matters.
> The thing is, if a target expects only the patterns as combine used to make
> them, it will regress (as you've seen on aarch64); but it will regress
> _silently_.  Which isn't so nice.
> 
> > But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?
> 
> Seeing for what targets / patterns it makes a difference would tell us the
> answer to that, too :-)
> 
> I'll run some tests with and without your patch.

So I ran the tests.  These are text sizes for vmlinux built for most
architectures (before resp. after the patch).  Results are good, but
they show many targets need some work...


BIGGER
 5410496   5432744   alpha
 3848987   3849143   arm
 2190276   2190292   blackfin
 2188431   2191503   c6x (but data shrank by more than text grew)
 2212322   2212706   cris
10896454  10896965   i386
 7471891   7488771   parisc
 6168799   6195391   parisc64
 1545840   1545872   shnommu
 1946649   1954149   xtensa

I looked at some of those.  Alpha regresses with s4add things, arm with
add ,lsl things, parisc with shladd things.  I tried to fix the parisc
one (it seemed simplest), and the 32-bit kernel got most (but not all)
of the size difference back; and the 64-bit wouldn't build (an assert
in the kernel failed, and it uses a floating point reg where an integer
one is needed -- I gave up).


SMALLER
 8688171   8688003   powerpc
20252605  20251797   powerpc64
11425334  11422558   s390
12300135  12299767   x86_64

The PowerPC differences are mostly what first was rlwinm;addi;rlwinm;lwz
and now is rlwinm;lwz; i.e. the add is moved after the two rotates, the
rotates are merged, and the add is made part of the offset in the load.


NO DIFF
 3321492   3321492   frv
 3252747   3252747   m32r
 4708232   4708232   microblaze
 3949145   3949145   mips
 5693019   5693019   mips64
 2373441   2373441   mn10300
 6489846   6489846   sh64
 3733749   3733749   sparc
 6075122   6075122   sparc64

Also quite a few without any diff.  Good :-)  Some of those might have
unnecessary add/mult patterns now, but they also have the add/shift.


I didn't see any big differences, and all are of the expected kind.
Some targets need some love (but what else is new).

The patch is approved for trunk.

Thanks,


Segher
Hans-Peter Nilsson May 16, 2015, 2:40 a.m. UTC | #6
On Fri, 1 May 2015, Segher Boessenkool wrote:
> On Wed, Apr 29, 2015 at 12:03:35PM -0500, Segher Boessenkool wrote:
> > On Wed, Apr 29, 2015 at 09:25:21AM +0000, Kumar, Venkataramanan wrote:
> > > diff --git a/gcc/combine.c b/gcc/combine.c
> > > index 5c763b4..945abdb 100644
> > > --- a/gcc/combine.c
> > > +++ b/gcc/combine.c
> > > @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
> > >       but once inside, go back to our default of SET.  */
> > >
> > >    next_code = (code == MEM ? MEM
> > > -              : ((code == PLUS || code == MINUS)
> > > -                 && SCALAR_INT_MODE_P (mode)) ? MEM
> > >                : ((code == COMPARE || COMPARISON_P (x))
> > >                   && XEXP (x, 1) == const0_rtx) ? COMPARE
> > >                : in_code == COMPARE ? SET : in_code);
> > >
> > >
> > > On X86_64, it passes bootstrap and regression tests.
> > > But on Aarch64 the test in PR passed, but I got a few test case failures.
> >
> > > There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated.
> > > Now those patterns have to be fixed to use SHIFTS.  Also need to see any impact on other targets.
> >
> > Right.  It would be good if you could find out for what targets it matters.
> > The thing is, if a target expects only the patterns as combine used to make
> > them, it will regress (as you've seen on aarch64); but it will regress
> > _silently_.  Which isn't so nice.
> >
> > > But  before that  I wanted to check if the assumption in combiner,  can simply be removed ?
> >
> > Seeing for what targets / patterns it makes a difference would tell us the
> > answer to that, too :-)
> >
> > I'll run some tests with and without your patch.
>
> So I ran the tests.  These are text sizes for vmlinux built for most
> architectures (before resp. after the patch).

Thanks for actually checking the impact.

>  Results are good, but
> they show many targets need some work...
>
>
> BIGGER
>  2212322   2212706   cris

Also observable as noted in PR66171, a regression:

Running /tmp/hpautotest-gcc0/gcc/gcc/testsuite/gcc.target/cris/cris.exp
...
FAIL: gcc.target/cris/biap.c scan-assembler addi
FAIL: gcc.target/cris/biap.c scan-assembler-not lsl

I confess the test-case-"guarded" addi pattern should have been
expressed with a shift in addition to the multiplication. ("In
addition to" as the canonically wrong one used to be the
combine-matching pattern; I'm not sure I should really drop that
just yet.)  I'll have to check that expressing using a shift
fixes this issue.

Supposedly more noteworthy: this now-stricter canonicalization
leads to a requirement to rewrite patterns that used to be:

 (parallel
  [(set reg0 (mem (plus (mult reg1 N) reg2)))
   (set reg3 (plus (mult reg1 N) reg2))])

into the awkwardly asymmetric:

 (parallel
  [(set reg0 (mem (plus (mult reg1 2) reg2)))
   (set reg3 (plus (ashift reg1 M) reg2))])

(where M = log2 N)

and of course a need to verify that combine actually *does* make
use of the pattern after the change at least as much as it did
before.

> Some targets need some love (but what else is new).

Meh.  Well, you now got some whine to go with that cheese.

brgds, H-P
Segher Boessenkool May 16, 2015, 2:32 p.m. UTC | #7
On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:
> I confess the test-case-"guarded" addi pattern should have been
> expressed with a shift in addition to the multiplication.

But they wouldn't ever match so they might very well have bitrotted
by now :-(

> ("In
> addition to" as the canonically wrong one used to be the
> combine-matching pattern; I'm not sure I should really drop that
> just yet.)

It is harmless to leave it in.  It will rot though, eventually --
better take it out before then.  Add some gcc_unreachable, perhaps.

> Supposedly more noteworthy: this now-stricter canonicalization
> leads to a requirement to rewrite patterns that used to be:
> 
>  (parallel
>   [(set reg0 (mem (plus (mult reg1 N) reg2)))
>    (set reg3 (plus (mult reg1 N) reg2))])
> 
> into the awkwardly asymmetric:
> 
>  (parallel
>   [(set reg0 (mem (plus (mult reg1 2) reg2)))
>    (set reg3 (plus (ashift reg1 M) reg2))])
> 
> (where M = log2 N)

Yeah.  This is true of parallels in general: canonicalisation works
on each arm separately.  I'm not sure what can be done about it.


Looks like quite some work for you, I'm sorry about that,


Segher
Hans-Peter Nilsson May 16, 2015, 4:36 p.m. UTC | #8
On Sat, 16 May 2015, Segher Boessenkool wrote:
> On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:
> > I confess the test-case-"guarded" addi pattern should have been
> > expressed with a shift in addition to the multiplication.
>
> But they wouldn't ever match so they might very well have bitrotted
> by now :-(

It seems you're saying that the canonicalization to "ashift"
didn't work *at all*, when starting with an expression from an
address?  I knew it failed in part, but always thought it was
just a partial failure.

> > ("In
> > addition to" as the canonically wrong one used to be the
> > combine-matching pattern; I'm not sure I should really drop that
> > just yet.)
>
> It is harmless to leave it in.  It will rot though, eventually --
> better take it out before then.  Add some gcc_unreachable, perhaps.

I've been trying to come up with valid reasons there'd be ever
be canonicalization by multiplication, but failed so I guess
I'll rip it out.

> > Supposedly more noteworthy: this now-stricter canonicalization
> > leads to a requirement to rewrite patterns that used to be:
> >
> >  (parallel
> >   [(set reg0 (mem (plus (mult reg1 N) reg2)))
> >    (set reg3 (plus (mult reg1 N) reg2))])
> >
> > into the awkwardly asymmetric:
> >
> >  (parallel
> >   [(set reg0 (mem (plus (mult reg1 2) reg2)))
> >    (set reg3 (plus (ashift reg1 M) reg2))])
> >
> > (where M = log2 N)
>
> Yeah.  This is true of parallels in general: canonicalisation works
> on each arm separately.  I'm not sure what can be done about it.
>
>
> Looks like quite some work for you, I'm sorry about that,

It's almost over, at least the editing part...

brgds, H-P
Segher Boessenkool May 16, 2015, 5:01 p.m. UTC | #9
On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote:
> On Sat, 16 May 2015, Segher Boessenkool wrote:
> > On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:
> > > I confess the test-case-"guarded" addi pattern should have been
> > > expressed with a shift in addition to the multiplication.
> >
> > But they wouldn't ever match so they might very well have bitrotted
> > by now :-(
> 
> It seems you're saying that the canonicalization to "ashift"
> didn't work *at all*, when starting with an expression from an
> address?  I knew it failed in part, but always thought it was
> just a partial failure.

With a plus or minus combine would always write it as a mult.
I don't think any other pass would create this combination.  I
haven't tested it though.

> > > ("In
> > > addition to" as the canonically wrong one used to be the
> > > combine-matching pattern; I'm not sure I should really drop that
> > > just yet.)
> >
> > It is harmless to leave it in.  It will rot though, eventually --
> > better take it out before then.  Add some gcc_unreachable, perhaps.
> 
> I've been trying to come up with valid reasons there'd be ever
> be canonicalization by multiplication, but failed so I guess
> I'll rip it out.

The "unreachable" thing should quickly tell you if that guess is wrong.
Not something you want to leave in a production compiler, of course.

> > Looks like quite some work for you, I'm sorry about that,
> 
> It's almost over, at least the editing part...

Great to hear that!


Segher
Hans-Peter Nilsson May 17, 2015, 1:55 a.m. UTC | #10
On Sat, 16 May 2015, Segher Boessenkool wrote:
> On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote:
> > On Sat, 16 May 2015, Segher Boessenkool wrote:
> > > On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:
> > > > I confess the test-case-"guarded" addi pattern should have been
> > > > expressed with a shift in addition to the multiplication.
> > >
> > > But they wouldn't ever match so they might very well have bitrotted
> > > by now :-(
> >
> > It seems you're saying that the canonicalization to "ashift"
> > didn't work *at all*, when starting with an expression from an
> > address?  I knew it failed in part, but always thought it was
> > just a partial failure.
>
> With a plus or minus combine would always write it as a mult.
> I don't think any other pass would create this combination.  I
> haven't tested it though.

Ports probably also generate that internally at various RTL
passes, something that takes a bit more than an at-a-glance code
inspection.

> > > > ("In
> > > > addition to" as the canonically wrong one used to be the
> > > > combine-matching pattern; I'm not sure I should really drop that
> > > > just yet.)
> > >
> > > It is harmless to leave it in.  It will rot though, eventually --
> > > better take it out before then.  Add some gcc_unreachable, perhaps.
> >
> > I've been trying to come up with valid reasons there'd be ever
> > be canonicalization by multiplication, but failed so I guess
> > I'll rip it out.
>
> The "unreachable" thing should quickly tell you if that guess is wrong.
> Not something you want to leave in a production compiler, of course.

I think you misunderstood; I mean that I pondered
"philosophical" reasons to change the canonical representation;
if there was some reasoning, current or looking forward, where
we'd "add back" mult as non-address-canonical RTL.  For the
current scheme, keeping the matching-patterns but with
assertions added might be helpful, yes.

(People looking for practical clues: for spotting dead target
code there are besides the alluded-to gcc_unreachable(), also
the options to call internal_error() with a friendly string or
generating invalid assembly with a descriptive mnemonic.)

Actually, there are two things you could help with:

- (pointer-to) step-by-step instructions to recreate the Linux
(kernel :) build, as those you did before for a multiple of
targets.  I'd like to know I fixed your observations.

- add a preferred canonicalization function to do conversion
to/from memory-address-canonical RTL.  Like
fwprop.c:canonicalize_address (just not static :) and maybe also
a canonicalize_nonaddress.  At the moment, ports call
replace_equiv_address (family of functions) when changing
address RTL, but that code-path (at a glance) doesn't
canonicalize whatever non-address-canonical RTL you throw at it.
Maybe it should?

> > > Looks like quite some work for you, I'm sorry about that,
> >
> > It's almost over, at least the editing part...
>
> Great to hear that!

Thanks for your enthusiasm!

Well, I've known about this wart for 6+ years as seen in PR37939
(but before that likely at least as long), so I *should* thank
you for fixing it; it's just that the added churn and required
tweaks has somewhat of a sour taste. :)

brgds, H-P
Segher Boessenkool May 17, 2015, 12:31 p.m. UTC | #11
On Sat, May 16, 2015 at 09:55:08PM -0400, Hans-Peter Nilsson wrote:
> > With a plus or minus combine would always write it as a mult.
> > I don't think any other pass would create this combination.  I
> > haven't tested it though.
> 
> Ports probably also generate that internally at various RTL
> passes, something that takes a bit more than an at-a-glance code
> inspection.

Yeah, true.  Hopefully port maintainers will know about what the
port does, or at least notice the ICEs.

> > > I've been trying to come up with valid reasons there'd be ever
> > > be canonicalization by multiplication, but failed so I guess
> > > I'll rip it out.
> >
> > The "unreachable" thing should quickly tell you if that guess is wrong.
> > Not something you want to leave in a production compiler, of course.
> 
> I think you misunderstood; I mean that I pondered
> "philosophical" reasons to change the canonical representation;
> if there was some reasoning, current or looking forward, where
> we'd "add back" mult as non-address-canonical RTL.

Ah.  Yes, same here -- I don't see any situation where having
mult instead of shift outside of a mem would be more convenient.

> Actually, there are two things you could help with:
> 
> - (pointer-to) step-by-step instructions to recreate the Linux
> (kernel :) build, as those you did before for a multiple of
> targets.  I'd like to know I fixed your observations.

I used <http://git.infradead.org/users/segher/buildall.git>; it has
a README.  I see that doc is a little out of date, I'll update.

> - add a preferred canonicalization function to do conversion
> to/from memory-address-canonical RTL.  Like
> fwprop.c:canonicalize_address (just not static :) and maybe also
> a canonicalize_nonaddress.  At the moment, ports call
> replace_equiv_address (family of functions) when changing
> address RTL, but that code-path (at a glance) doesn't
> canonicalize whatever non-address-canonical RTL you throw at it.
> Maybe it should?

Maybe validize_mem is nicer?  It depends much what you really
want to change.

Does simplify_rtx not do what you want for "nonaddress"?


Segher
Hans-Peter Nilsson May 17, 2015, 1:32 p.m. UTC | #12
On Sun, 17 May 2015, Segher Boessenkool wrote:
> I used <http://git.infradead.org/users/segher/buildall.git>; it has
> a README.  I see that doc is a little out of date, I'll update.

("git:" not "http:" for cloning)  Thanks, looks useful.
Hm, maybe we already mention this in the wiki...

> > - add a preferred canonicalization function to do conversion
> > to/from memory-address-canonical RTL.  Like
> > fwprop.c:canonicalize_address (just not static :) and maybe also
> > a canonicalize_nonaddress.  At the moment, ports call
> > replace_equiv_address (family of functions) when changing
> > address RTL, but that code-path (at a glance) doesn't
> > canonicalize whatever non-address-canonical RTL you throw at it.
> > Maybe it should?
>
> Maybe validize_mem is nicer?  It depends much what you really
> want to change.

I'd had made use of a function that'd automatically rewrite an
ASHIFT into a MULT.  No, validize_mem is kind-of a sibling to
replace_equiv_address with just the containing MEM and will just
call it, says the code.  A call to replace_equiv_address (& Co.)
will *break out* invalid expressions like an ASHIFT - or at
least is supposed to do that.

But that'd be mostly for convenience of other ports; I've coped
now, I think.

> Does simplify_rtx not do what you want for "nonaddress"?

Oh, right, that's already what we have, thanks.

brgds, H-P
Jeff Law May 19, 2015, 5:28 p.m. UTC | #13
On 05/16/2015 07:55 PM, Hans-Peter Nilsson wrote:
> On Sat, 16 May 2015, Segher Boessenkool wrote:
>> On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote:
>>> On Sat, 16 May 2015, Segher Boessenkool wrote:
>>>> On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:
>>>>> I confess the test-case-"guarded" addi pattern should have been
>>>>> expressed with a shift in addition to the multiplication.
>>>>
>>>> But they wouldn't ever match so they might very well have bitrotted
>>>> by now :-(
>>>
>>> It seems you're saying that the canonicalization to "ashift"
>>> didn't work *at all*, when starting with an expression from an
>>> address?  I knew it failed in part, but always thought it was
>>> just a partial failure.
>>
>> With a plus or minus combine would always write it as a mult.
>> I don't think any other pass would create this combination.  I
>> haven't tested it though.
>
> Ports probably also generate that internally at various RTL
> passes, something that takes a bit more than an at-a-glance code
> inspection.
Correct.  THe PA port for example has a ton of this kind of RTL 
rewriting to exploit the shift-add insns and scaled indexed addressing 
modes (and correct for some oddities in the PA chip where the scaled 
modes don't exist in every context where you'd think they should).

And you still have to to worry about things like combine taking a (mem 
(plus (mult))), selecting the (plus (mult)) as a split point and failing 
to canonicalize it into the ashift form.

I ran into that while fixing up the PA for these changes.  The good news 
is with two trivial combine changes and the expected changes to the PA 
backend, I can get code generation back to where it was before across my 
sample testfiles.


Jeff
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 5c763b4..945abdb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7703,8 +7703,6 @@  make_compound_operation (rtx x, enum rtx_code in_code)
      but once inside, go back to our default of SET.  */

   next_code = (code == MEM ? MEM
-              : ((code == PLUS || code == MINUS)
-                 && SCALAR_INT_MODE_P (mode)) ? MEM
               : ((code == COMPARE || COMPARISON_P (x))
                  && XEXP (x, 1) == const0_rtx) ? COMPARE
               : in_code == COMPARE ? SET : in_code);