diff mbox series

Simplify paradoxical subreg extensions of TRUNCATE

Message ID 001401d7a2a5$5bf07db0$13d17910$@nextmovesoftware.com
State New
Headers show
Series Simplify paradoxical subreg extensions of TRUNCATE | expand

Commit Message

Roger Sayle Sept. 5, 2021, 10:28 p.m. UTC
This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
(truncate:HI (reg:SI)), and closely related variants.  In RTL, a
paradoxical SUBREG where the outermode is wider than the innermode
is like the extensions zero_extend or sign_extend, but where we
don't care about the contents of the extended bits.  Hence, in the
above case, it's convenient to eliminate the SUBREG by tweaking
the explicit TRUNCATE to include more bits from the original REG.

There are three possibilities: as above where outermode < origmode,
where we can eliminate the paradoxical SUBREG using a wider TRUNCATE;
secondly when outermode == origmode, we can eliminate both the SUBREG
and the TRUNCATE, so (subreg:SI (truncate:QI (reg:SI))) becomes just
(reg:SI); and finally when outermode > origmode, we can eliminate
the TRUNCATE, and generate a paradoxical subreg from the original
source, so (subreg:DI (truncate:QI (reg:SI))) becomes the simpler
(subreg:DI (reg:SI)).

An example benefit of this simplification is that on nvptx-none,
int foo(char x, char y) { return (x=='x') && (y=='y'); }
shrinks from 17 instructions to 13 instructions.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and also on nvptx-none with
no new failures.  Ok for mainline?


2021-09-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* simplify-rtx.c (simplify_subreg): Optimize paradoxical subreg
	extensions of TRUNCATE.

Roger
--

Comments

Segher Boessenkool Sept. 6, 2021, 10:14 a.m. UTC | #1
On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
> This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
> (truncate:HI (reg:SI)), and closely related variants.

Subregs of other than regs are undefined in RTL.  You will first have to
define this (in documentation as well as in other code that handles
subregs).  I doubt this is possible to do, subreg have so many
overloaded meanings already.


Segher
Richard Biener Sept. 6, 2021, 11:11 a.m. UTC | #2
On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
> > (truncate:HI (reg:SI)), and closely related variants.
>
> Subregs of other than regs are undefined in RTL.  You will first have to
> define this (in documentation as well as in other code that handles
> subregs).  I doubt this is possible to do, subreg have so many
> overloaded meanings already.

I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM
is equal to

  (set (reg:MODE2) (xyz:MODE2 ..))
  (subreg:MODE1 (reg:MODE2) ...)

with 'reg' being a pseudo reg is the (only?) sensible way of defining it.

That would make the simplification apply when substituting the pseudos
definition inside the subreg which maybe is what this targets?

Richard.

>
>
> Segher
Roger Sayle Sept. 6, 2021, 11:32 a.m. UTC | #3
Hi Segher,
I think the current documentation is sufficient.  During compilation, GCC's
combine pass will often substitute a register with an expression defining
it's value, and then attempt to simplify it.  As you point out, this often
produces
(temporary intermediate) expressions with poorly defined semantics.  The
purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
as an argument (that some folks consider undefined) and instead simplify
them to either a single TRUNCATE or a SUBREG of a REG, both of which are
well defined.  I'm making/helping the implementation match the
documentation.

From the current output of -fdump-rtl-combine-all=stdout

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
      REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
      REG_DEAD r29:QI
Failed to match this instruction:
(set (reg:HI 38)
    (subreg:HI (truncate:QI (reg:SI 32)) 0))

with this patch:

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
      REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
      REG_DEAD r29:QI
Successfully matched this instruction:
(set (reg:HI 38)
    (truncate:HI (reg:SI 32)))
allowing combination of insns 10 and 15
original costs 4 + 12 = 16
replacement cost 4
deferring deletion of insn with uid = 10.


A few lines earlier in this function, it simplifies SUBREGs of CONST_INT,
which
would also have poorly defined semantics.  Perhaps your interpretation of
the
RTL documentation doesn't strictly apply to this part of the RTL optimizers?

Let me know what you think.
Best regards,
Roger
--

-----Original Message-----
From: Segher Boessenkool <segher@kernel.crashing.org> 
Sent: 06 September 2021 11:14
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
> This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as 
> (truncate:HI (reg:SI)), and closely related variants.

Subregs of other than regs are undefined in RTL.  You will first have to
define this (in documentation as well as in other code that handles
subregs).  I doubt this is possible to do, subreg have so many overloaded
meanings already.


Segher
Segher Boessenkool Sept. 6, 2021, 2:24 p.m. UTC | #4
Hi!

On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote:
> I think the current documentation is sufficient.  During compilation, GCC's
> combine pass will often substitute a register with an expression defining
> it's value, and then attempt to simplify it.  As you point out, this often
> produces
> (temporary intermediate) expressions with poorly defined semantics.

Not "poorly defined".  The documentation says (in rtl.texi)

@findex subreg
@item (subreg:@var{m1} @var{reg:m2} @var{bytenum})

@code{subreg} expressions are used to refer to a register in a machine
mode other than its natural one, or to refer to one register of
a multi-part @code{reg} that actually refers to several registers.

It goes on to say there also are subregs of mem currently; it
exhaustively lists all things you can take a subreg of, what the
different semantics of those are, how the semantics are further
"modified" (read: completely different) if some RTL flags are set, etc.

The instruction combiner very often creates invalid RTL (only
structurally valid, like, no missing operands).  Invalid RTL is supposed
to never match (because backends will not have patterns that match
these).  combine even creates invalid RTL on purpose (a clobber of
const0_rtx) to ensure its result is deemed invalid, when it wants to
abort a combination attempt for some reason.

> The
> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
> as an argument (that some folks consider undefined) and instead simplify
> them to either a single TRUNCATE or a SUBREG of a REG, both of which are
> well defined.  I'm making/helping the implementation match the
> documentation.

But this should never make it to simplify-rtx at all.  You can only
validly do such transformations in combine, because you need to know
what insns you started with etc.

simplify-rtx is a part of combine, sure, but it is used from other
contexts as well nowadays.  If you can safely do this from combine,
you can do it in combine.

> Trying 10 -> 15:
>    10: r29:QI=trunc(r32:SI)
>       REG_DEAD r32:SI
>    15: r38:HI=r29:QI#0
>       REG_DEAD r29:QI
> Failed to match this instruction:
> (set (reg:HI 38)
>     (subreg:HI (truncate:QI (reg:SI 32)) 0))
> 
> with this patch:
> 
> Trying 10 -> 15:
>    10: r29:QI=trunc(r32:SI)
>       REG_DEAD r32:SI
>    15: r38:HI=r29:QI#0
>       REG_DEAD r29:QI
> Successfully matched this instruction:
> (set (reg:HI 38)
>     (truncate:HI (reg:SI 32)))
> allowing combination of insns 10 and 15
> original costs 4 + 12 = 16
> replacement cost 4
> deferring deletion of insn with uid = 10.

(It appears costs are a bit off on your target?  Or is HImode
especially expensive for some reason?)

There is no subreg of a non-reg in there, so this is fine.

> A few lines earlier in this function, it simplifies SUBREGs of CONST_INT,
> which
> would also have poorly defined semantics.  Perhaps your interpretation of
> the
> RTL documentation doesn't strictly apply to this part of the RTL optimizers?

Perhaps this is a bug.  It is if it actually allows subregs of ints as
input.


Segher
Jeff Law Sept. 19, 2021, 3:14 p.m. UTC | #5
On 9/6/2021 8:24 AM, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote:
>> I think the current documentation is sufficient.  During compilation, GCC's
>> combine pass will often substitute a register with an expression defining
>> it's value, and then attempt to simplify it.  As you point out, this often
>> produces
>> (temporary intermediate) expressions with poorly defined semantics.
> Not "poorly defined".  The documentation says (in rtl.texi)
>
> @findex subreg
> @item (subreg:@var{m1} @var{reg:m2} @var{bytenum})
>
> @code{subreg} expressions are used to refer to a register in a machine
> mode other than its natural one, or to refer to one register of
> a multi-part @code{reg} that actually refers to several registers.
>
> It goes on to say there also are subregs of mem currently; it
> exhaustively lists all things you can take a subreg of, what the
> different semantics of those are, how the semantics are further
> "modified" (read: completely different) if some RTL flags are set, etc.
>
> The instruction combiner very often creates invalid RTL (only
> structurally valid, like, no missing operands).  Invalid RTL is supposed
> to never match (because backends will not have patterns that match
> these).  combine even creates invalid RTL on purpose (a clobber of
> const0_rtx) to ensure its result is deemed invalid, when it wants to
> abort a combination attempt for some reason.
>
>> The
>> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
>> as an argument (that some folks consider undefined) and instead simplify
>> them to either a single TRUNCATE or a SUBREG of a REG, both of which are
>> well defined.  I'm making/helping the implementation match the
>> documentation.
> But this should never make it to simplify-rtx at all.  You can only
> validly do such transformations in combine, because you need to know
> what insns you started with etc.
>
> simplify-rtx is a part of combine, sure, but it is used from other
> contexts as well nowadays.  If you can safely do this from combine,
> you can do it in combine.
[ ... ]
So what I think is missing here is some concrete path forward.  If I'm 
understanding your objection Segher, you'd like to see Roger look at 
where these (subreg (truncate)) expressions came from and address them 
earlier than simplify_rtx?  The theory being that the simplification 
bits could be used from other contexts than just combine and in those 
other contexts (subreg (truncate)) isn't valid?

Jeff
Segher Boessenkool Sept. 21, 2021, 12:23 a.m. UTC | #6
Hi!

On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote:
> On 9/6/2021 8:24 AM, Segher Boessenkool wrote:
> >On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote:
> >>I think the current documentation is sufficient.  During compilation, 
> >>GCC's
> >>combine pass will often substitute a register with an expression defining
> >>it's value, and then attempt to simplify it.  As you point out, this often
> >>produces
> >>(temporary intermediate) expressions with poorly defined semantics.
> >Not "poorly defined".  The documentation says (in rtl.texi)
> >
> >@findex subreg
> >@item (subreg:@var{m1} @var{reg:m2} @var{bytenum})
> >
> >@code{subreg} expressions are used to refer to a register in a machine
> >mode other than its natural one, or to refer to one register of
> >a multi-part @code{reg} that actually refers to several registers.
> >
> >It goes on to say there also are subregs of mem currently; it
> >exhaustively lists all things you can take a subreg of, what the
> >different semantics of those are, how the semantics are further
> >"modified" (read: completely different) if some RTL flags are set, etc.
> >
> >The instruction combiner very often creates invalid RTL (only
> >structurally valid, like, no missing operands).  Invalid RTL is supposed
> >to never match (because backends will not have patterns that match
> >these).  combine even creates invalid RTL on purpose (a clobber of
> >const0_rtx) to ensure its result is deemed invalid, when it wants to
> >abort a combination attempt for some reason.
> >
> >>The
> >>purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
> >>as an argument (that some folks consider undefined) and instead simplify
> >>them to either a single TRUNCATE or a SUBREG of a REG, both of which are
> >>well defined.  I'm making/helping the implementation match the
> >>documentation.
> >But this should never make it to simplify-rtx at all.  You can only
> >validly do such transformations in combine, because you need to know
> >what insns you started with etc.
> >
> >simplify-rtx is a part of combine, sure, but it is used from other
> >contexts as well nowadays.  If you can safely do this from combine,
> >you can do it in combine.
> [ ... ]
> So what I think is missing here is some concrete path forward.  If I'm 
> understanding your objection Segher, you'd like to see Roger look at 
> where these (subreg (truncate)) expressions came from and address them 
> earlier than simplify_rtx?

There is no such thing as "earlier than simplify-rtx", that is the
point.  simplify-rtx is not a pass: it is like a library that is used
from all over the RTL routines.

If something belongs in combine, it should be in combine, not in
simplify-rtx.

> The theory being that the simplification 
> bits could be used from other contexts than just combine and in those 
> other contexts (subreg (truncate)) isn't valid?

Or not actually simplifying (or canonicalising).  Yup.

It also is the case that invalid RTL should simply never be constructed,
we should not (try to) fix it up later.  I'm not sure if that was what
was happening in this case, too much of this stuff recently, everything
blurs together, sorry :-/

I would love to see this all improved.  But if we just poke at it with
a stick we'll be in worse shape instead of better by the time stage 1
closes.  So we need a plan, and/or a topic tree (some devel/* branch or
whatever).


Segher
Jeff Law Sept. 21, 2021, 4:18 a.m. UTC | #7
On 9/20/2021 6:23 PM, Segher Boessenkool wrote:
> Hi!
>
> On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote:
>> On 9/6/2021 8:24 AM, Segher Boessenkool wrote:
>>> On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote:
>>>> I think the current documentation is sufficient.  During compilation,
>>>> GCC's
>>>> combine pass will often substitute a register with an expression defining
>>>> it's value, and then attempt to simplify it.  As you point out, this often
>>>> produces
>>>> (temporary intermediate) expressions with poorly defined semantics.
>>> Not "poorly defined".  The documentation says (in rtl.texi)
>>>
>>> @findex subreg
>>> @item (subreg:@var{m1} @var{reg:m2} @var{bytenum})
>>>
>>> @code{subreg} expressions are used to refer to a register in a machine
>>> mode other than its natural one, or to refer to one register of
>>> a multi-part @code{reg} that actually refers to several registers.
>>>
>>> It goes on to say there also are subregs of mem currently; it
>>> exhaustively lists all things you can take a subreg of, what the
>>> different semantics of those are, how the semantics are further
>>> "modified" (read: completely different) if some RTL flags are set, etc.
>>>
>>> The instruction combiner very often creates invalid RTL (only
>>> structurally valid, like, no missing operands).  Invalid RTL is supposed
>>> to never match (because backends will not have patterns that match
>>> these).  combine even creates invalid RTL on purpose (a clobber of
>>> const0_rtx) to ensure its result is deemed invalid, when it wants to
>>> abort a combination attempt for some reason.
>>>
>>>> The
>>>> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
>>>> as an argument (that some folks consider undefined) and instead simplify
>>>> them to either a single TRUNCATE or a SUBREG of a REG, both of which are
>>>> well defined.  I'm making/helping the implementation match the
>>>> documentation.
>>> But this should never make it to simplify-rtx at all.  You can only
>>> validly do such transformations in combine, because you need to know
>>> what insns you started with etc.
>>>
>>> simplify-rtx is a part of combine, sure, but it is used from other
>>> contexts as well nowadays.  If you can safely do this from combine,
>>> you can do it in combine.
>> [ ... ]
>> So what I think is missing here is some concrete path forward.  If I'm
>> understanding your objection Segher, you'd like to see Roger look at
>> where these (subreg (truncate)) expressions came from and address them
>> earlier than simplify_rtx?
> There is no such thing as "earlier than simplify-rtx", that is the
> point.  simplify-rtx is not a pass: it is like a library that is used
> from all over the RTL routines.
I'm referring to earlier in the call chain, not an earlier pass. Sorry I 
wasn't clear about that.

If we were catching the scenario which led to the creation of (subreg 
(truncate)) in combine and instead of creating (subreg (truncate)) we 
instead created the simplified, correct form would that ease your concerns?



Jeff
Richard Biener Sept. 21, 2021, 6:24 a.m. UTC | #8
On Tue, Sep 21, 2021 at 6:18 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 9/20/2021 6:23 PM, Segher Boessenkool wrote:
> > Hi!
> >
> > On Sun, Sep 19, 2021 at 09:14:55AM -0600, Jeff Law wrote:
> >> On 9/6/2021 8:24 AM, Segher Boessenkool wrote:
> >>> On Mon, Sep 06, 2021 at 12:32:13PM +0100, Roger Sayle wrote:
> >>>> I think the current documentation is sufficient.  During compilation,
> >>>> GCC's
> >>>> combine pass will often substitute a register with an expression defining
> >>>> it's value, and then attempt to simplify it.  As you point out, this often
> >>>> produces
> >>>> (temporary intermediate) expressions with poorly defined semantics.
> >>> Not "poorly defined".  The documentation says (in rtl.texi)
> >>>
> >>> @findex subreg
> >>> @item (subreg:@var{m1} @var{reg:m2} @var{bytenum})
> >>>
> >>> @code{subreg} expressions are used to refer to a register in a machine
> >>> mode other than its natural one, or to refer to one register of
> >>> a multi-part @code{reg} that actually refers to several registers.
> >>>
> >>> It goes on to say there also are subregs of mem currently; it
> >>> exhaustively lists all things you can take a subreg of, what the
> >>> different semantics of those are, how the semantics are further
> >>> "modified" (read: completely different) if some RTL flags are set, etc.
> >>>
> >>> The instruction combiner very often creates invalid RTL (only
> >>> structurally valid, like, no missing operands).  Invalid RTL is supposed
> >>> to never match (because backends will not have patterns that match
> >>> these).  combine even creates invalid RTL on purpose (a clobber of
> >>> const0_rtx) to ensure its result is deemed invalid, when it wants to
> >>> abort a combination attempt for some reason.
> >>>
> >>>> The
> >>>> purpose of my patch is to avoid (constructing) SUBREGs that have TRUNCATE
> >>>> as an argument (that some folks consider undefined) and instead simplify
> >>>> them to either a single TRUNCATE or a SUBREG of a REG, both of which are
> >>>> well defined.  I'm making/helping the implementation match the
> >>>> documentation.
> >>> But this should never make it to simplify-rtx at all.  You can only
> >>> validly do such transformations in combine, because you need to know
> >>> what insns you started with etc.
> >>>
> >>> simplify-rtx is a part of combine, sure, but it is used from other
> >>> contexts as well nowadays.  If you can safely do this from combine,
> >>> you can do it in combine.
> >> [ ... ]
> >> So what I think is missing here is some concrete path forward.  If I'm
> >> understanding your objection Segher, you'd like to see Roger look at
> >> where these (subreg (truncate)) expressions came from and address them
> >> earlier than simplify_rtx?
> > There is no such thing as "earlier than simplify-rtx", that is the
> > point.  simplify-rtx is not a pass: it is like a library that is used
> > from all over the RTL routines.
> I'm referring to earlier in the call chain, not an earlier pass. Sorry I
> wasn't clear about that.
>
> If we were catching the scenario which led to the creation of (subreg
> (truncate)) in combine and instead of creating (subreg (truncate)) we
> instead created the simplified, correct form would that ease your concerns?

Sorry to chime in, but if (subreg (truncate ...)) is invalid it would be OK for
simplify_gen_subreg () to
 1) return NULL_RTX
 2) return valid RTX, in this case the suggested canoncalized form
I agree that simplify_gen_subreg should not return (subreg (truncate ..)).

So I'm not understanding if you are suggesting that even calling
simplify_gen_subreg on a (truncate ...) is invalid?

Now, the patch changes simplify_subreg where it's not clear whether
the subreg already exists (it should not) or whether we are about to
construct it, seeing if there's a valid RTX that expresses the semantic
as if the op were pushed to a pseudo (aka we're combining on the fly).

It's a bit like fold_convert (double_type_node, integer_one_node)
second-guessing and doing a FLOAT_EXPR when folding
the implicit CONVERT_EXPR.

Richard.

>
>
> Jeff
Richard Sandiford Sept. 21, 2021, 12:01 p.m. UTC | #9
[Using this is a convenient place to reply to the thread as a whole]

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
>> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
>> > (truncate:HI (reg:SI)), and closely related variants.
>>
>> Subregs of other than regs are undefined in RTL.  You will first have to
>> define this (in documentation as well as in other code that handles
>> subregs).  I doubt this is possible to do, subreg have so many
>> overloaded meanings already.
>
> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM
> is equal to
>
>   (set (reg:MODE2) (xyz:MODE2 ..))
>   (subreg:MODE1 (reg:MODE2) ...)
>
> with 'reg' being a pseudo reg is the (only?) sensible way of defining it.

Agreed.  And I think that's already the de facto definition (and has been
for a long time).  Subreg as an operation has to have defined semantics
for all the cases that simplify_subreg handles, otherwise we have GIGO
and a lot of the function could be deleted.  We can (and do) choose
to prevent some of those operations becoming actual rtxes, but even there,
the de facto rules are quite broad.  E.g.:

- Like you said later, simplify_gen_subreg is opt-out rather than opt-in
  in terms of the subreg rtxes that it's prepared to create.

- Even rs6000.md has:

    (define_insn "*<su>mul<mode>3_highpart"
      [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
            (subreg:GPR
              (mult:<DMODE> (any_extend:<DMODE>
                              (match_operand:GPR 1 "gpc_reg_operand" "r"))
                            (any_extend:<DMODE>
                              (match_operand:GPR 2 "gpc_reg_operand" "r")))
             0))]
      "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)"
      "mulh<wd><u> %0,%1,%2"
      [(set_attr "type" "mul")
       (set_attr "size" "<bits>")])

  Many other ports have similar patterns.

The problem with “combine can generate invalid rtl but backends
should reject it” is that, generally, people write combine patterns
by looking at what combine _wants_ to generate and then writing
.md patterns to match that.  In other words, combine in practice
defines the (de facto) correct rtl representation of a combined
sequence.

Given:

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
      REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
      REG_DEAD r29:QI
Failed to match this instruction:
(set (reg:HI 38)
    (subreg:HI (truncate:QI (reg:SI 32)) 0))

I'm sure there's a temptation to add an .md pattern that matches
the subreg. :-)

Thanks,
Richard

> That would make the simplification apply when substituting the pseudos
> definition inside the subreg which maybe is what this targets?
>
> Richard.
>
>>
>>
>> Segher
Segher Boessenkool Sept. 21, 2021, 12:24 p.m. UTC | #10
On Mon, Sep 20, 2021 at 10:18:15PM -0600, Jeff Law wrote:
> On 9/20/2021 6:23 PM, Segher Boessenkool wrote:
> >There is no such thing as "earlier than simplify-rtx", that is the
> >point.  simplify-rtx is not a pass: it is like a library that is used
> >from all over the RTL routines.
> I'm referring to earlier in the call chain, not an earlier pass. Sorry I 
> wasn't clear about that.

Ah okay, I see.

> If we were catching the scenario which led to the creation of (subreg 
> (truncate)) in combine and instead of creating (subreg (truncate)) we 
> instead created the simplified, correct form would that ease your concerns?

Yes please.  And combine has access to the original instructions, so can
make sure to only ever create something with the same semantics.

Thanks,


Segher
Segher Boessenkool Sept. 21, 2021, 12:34 p.m. UTC | #11
On Tue, Sep 21, 2021 at 08:24:04AM +0200, Richard Biener wrote:
> On Tue, Sep 21, 2021 at 6:18 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> > If we were catching the scenario which led to the creation of (subreg
> > (truncate)) in combine and instead of creating (subreg (truncate)) we
> > instead created the simplified, correct form would that ease your concerns?
> 
> Sorry to chime in,

No, please do!

> but if (subreg (truncate ...)) is invalid it would be OK for
> simplify_gen_subreg () to
>  1) return NULL_RTX

Yes, that is valid, but the simplify-rtx routines (as well as pretty
much everything dealing with RTL) doesn't have to check it is valid RTL,
as a matter of principle: it can just assume anything passed around *is*
valid.  Anything simplify-rtx gets as input should be valid RTL, and the
only thing it should be concerned with is *generating* only valid RTL as
well :-)

>  2) return valid RTX, in this case the suggested canoncalized form
> I agree that simplify_gen_subreg should not return (subreg (truncate ..)).
> 
> So I'm not understanding if you are suggesting that even calling
> simplify_gen_subreg on a (truncate ...) is invalid?

What would the semantics be?  A subreg has a reg as operand, and its
semantics are different for pseudos and for hard regs!

We could define the semantics of calling simplify_gen_subreg on
non-regs, but that will make the interface only more confused imo,
much harder to use correctly.

> Now, the patch changes simplify_subreg where it's not clear whether
> the subreg already exists (it should not) or whether we are about to
> construct it, seeing if there's a valid RTX that expresses the semantic
> as if the op were pushed to a pseudo (aka we're combining on the fly).
> 
> It's a bit like fold_convert (double_type_node, integer_one_node)
> second-guessing and doing a FLOAT_EXPR when folding
> the implicit CONVERT_EXPR.

Yeah :-/


Segher
Roger Sayle Sept. 21, 2021, 12:54 p.m. UTC | #12
That define_insn is making my eyes bleed!  I think that's the most convincing
argument I've ever read on gcc-patches, and I can see now what Segher is so
opposed to.  My new goal is to add sh_mul and uh_mul as RTL expression
codes for highpart multiplication, so that that pattern will go away [and
never ever been seen again].

Fortunately, the proposed patch is part of the solution, rather than part of
the problem.  simplify_subreg is used to find a sensible alternate form for
a hypothetical SUBREG before it's created (and normally it'll only be created
if it's valid c.f. validate_subreg in simplify_gen_subreg).  This function is called
over 30 million times during stage2/stage3 of a bootstrap on x86_64, with
the most common "SUBREG" operand RTX codes being:

16160294 reg
5627157 const_int
3278692 mem
 960839 lshiftrt
 902685 plus
 690246 and
 430131 subreg
 319407 zero_extract
 319145 minus
 255043 value
 232959 ashift
 225468 mult
 217819 xor
 207645 clobber
 207454 ashiftrt
 100213 ior

Adding a simplify_subreg for mult, replacing them with a highpart rtx should
avoid (reduce) combine's Frankenstein patterns appearing in machine descriptions.
Presumably, the reason that this define_insn starts with a "*" even for a recognized
named pattern, is that if expand ever attempted emitting this instruction, the
internal sanity checks for a sane/valid SUBREG would fail/ICE.

p.s. You're right that there's a temptation to add ugly patterns to a backend,
instead of fixing their simplification in the middle-end; but only when backend
patches get reviewed faster/easier than middle-end patches 😊

Awesome (counter-)example!

Roger
--

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 
Sent: 21 September 2021 13:02
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>; Richard Biener <richard.guenther@gmail.com>; Roger Sayle <roger@nextmovesoftware.com>
Subject: Re: [PATCH] Simplify paradoxical subreg extensions of TRUNCATE

[Using this is a convenient place to reply to the thread as a whole]

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool 
> <segher@kernel.crashing.org> wrote:
>>
>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
>> > This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as 
>> > (truncate:HI (reg:SI)), and closely related variants.
>>
>> Subregs of other than regs are undefined in RTL.  You will first have 
>> to define this (in documentation as well as in other code that 
>> handles subregs).  I doubt this is possible to do, subreg have so 
>> many overloaded meanings already.
>
> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM is 
> equal to
>
>   (set (reg:MODE2) (xyz:MODE2 ..))
>   (subreg:MODE1 (reg:MODE2) ...)
>
> with 'reg' being a pseudo reg is the (only?) sensible way of defining it.

Agreed.  And I think that's already the de facto definition (and has been for a long time).  Subreg as an operation has to have defined semantics for all the cases that simplify_subreg handles, otherwise we have GIGO and a lot of the function could be deleted.  We can (and do) choose to prevent some of those operations becoming actual rtxes, but even there, the de facto rules are quite broad.  E.g.:

- Like you said later, simplify_gen_subreg is opt-out rather than opt-in
  in terms of the subreg rtxes that it's prepared to create.

- Even rs6000.md has:

    (define_insn "*<su>mul<mode>3_highpart"
      [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
            (subreg:GPR
              (mult:<DMODE> (any_extend:<DMODE>
                              (match_operand:GPR 1 "gpc_reg_operand" "r"))
                            (any_extend:<DMODE>
                              (match_operand:GPR 2 "gpc_reg_operand" "r")))
             0))]
      "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)"
      "mulh<wd><u> %0,%1,%2"
      [(set_attr "type" "mul")
       (set_attr "size" "<bits>")])

  Many other ports have similar patterns.

The problem with “combine can generate invalid rtl but backends should reject it” is that, generally, people write combine patterns by looking at what combine _wants_ to generate and then writing .md patterns to match that.  In other words, combine in practice defines the (de facto) correct rtl representation of a combined sequence.

Given:

Trying 10 -> 15:
   10: r29:QI=trunc(r32:SI)
      REG_DEAD r32:SI
   15: r38:HI=r29:QI#0
      REG_DEAD r29:QI
Failed to match this instruction:
(set (reg:HI 38)
    (subreg:HI (truncate:QI (reg:SI 32)) 0))

I'm sure there's a temptation to add an .md pattern that matches the subreg. :-)

Thanks,
Richard

> That would make the simplification apply when substituting the pseudos 
> definition inside the subreg which maybe is what this targets?
>
> Richard.
>
>>
>>
>> Segher
Jeff Law Sept. 22, 2021, 5:48 p.m. UTC | #13
On 9/21/2021 6:01 AM, Richard Sandiford via Gcc-patches wrote:
> [Using this is a convenient place to reply to the thread as a whole]
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Mon, Sep 6, 2021 at 12:15 PM Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>>> On Sun, Sep 05, 2021 at 11:28:30PM +0100, Roger Sayle wrote:
>>>> This patch simplifies the RTX (subreg:HI (truncate:QI (reg:SI))) as
>>>> (truncate:HI (reg:SI)), and closely related variants.
>>> Subregs of other than regs are undefined in RTL.  You will first have to
>>> define this (in documentation as well as in other code that handles
>>> subregs).  I doubt this is possible to do, subreg have so many
>>> overloaded meanings already.
>> I suppose (subreg:MODE1 (xyz:MODE2 ..)) where xyz is not REG or MEM
>> is equal to
>>
>>    (set (reg:MODE2) (xyz:MODE2 ..))
>>    (subreg:MODE1 (reg:MODE2) ...)
>>
>> with 'reg' being a pseudo reg is the (only?) sensible way of defining it.
> Agreed.  And I think that's already the de facto definition (and has been
> for a long time).  Subreg as an operation has to have defined semantics
> for all the cases that simplify_subreg handles, otherwise we have GIGO
> and a lot of the function could be deleted.  We can (and do) choose
> to prevent some of those operations becoming actual rtxes, but even there,
> the de facto rules are quite broad.  E.g.:
>
> - Like you said later, simplify_gen_subreg is opt-out rather than opt-in
>    in terms of the subreg rtxes that it's prepared to create.
>
> - Even rs6000.md has:
>
>      (define_insn "*<su>mul<mode>3_highpart"
>        [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>              (subreg:GPR
>                (mult:<DMODE> (any_extend:<DMODE>
>                                (match_operand:GPR 1 "gpc_reg_operand" "r"))
>                              (any_extend:<DMODE>
>                                (match_operand:GPR 2 "gpc_reg_operand" "r")))
>               0))]
>        "WORDS_BIG_ENDIAN && !(<MODE>mode == SImode && TARGET_POWERPC64)"
>        "mulh<wd><u> %0,%1,%2"
>        [(set_attr "type" "mul")
>         (set_attr "size" "<bits>")])
>
>    Many other ports have similar patterns.
>
> The problem with “combine can generate invalid rtl but backends
> should reject it” is that, generally, people write combine patterns
> by looking at what combine _wants_ to generate and then writing
> .md patterns to match that.  In other words, combine in practice
> defines the (de facto) correct rtl representation of a combined
> sequence.
Yup.   In fact, I'm having this exact concern with an internal chunk of 
work right now :-)


>
> Given:
>
> Trying 10 -> 15:
>     10: r29:QI=trunc(r32:SI)
>        REG_DEAD r32:SI
>     15: r38:HI=r29:QI#0
>        REG_DEAD r29:QI
> Failed to match this instruction:
> (set (reg:HI 38)
>      (subreg:HI (truncate:QI (reg:SI 32)) 0))
>
> I'm sure there's a temptation to add an .md pattern that matches
> the subreg. :-)
And that's why my stated position is that any subreg in a backend 
pattern needs to be justified.  Obviously all kinds exist, but when I 
see them I ask for a justification.

Jeff
Jeff Law Sept. 22, 2021, 5:50 p.m. UTC | #14
On 9/21/2021 6:54 AM, Roger Sayle wrote:
> That define_insn is making my eyes bleed!  I think that's the most convincing
> argument I've ever read on gcc-patches, and I can see now what Segher is so
> opposed to.
Then you haven't seen enough patterns or your eyes haven't toughened up 
through the years :-)


Jeff
diff mbox series

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb..3040136 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -7403,13 +7403,34 @@  simplify_context::simplify_subreg (machine_mode outermode, rtx op,
 	  return immed_wide_int_const (val, int_outermode);
 	}
 
-      if (GET_MODE_PRECISION (int_outermode)
-	  < GET_MODE_PRECISION (int_innermode))
+      unsigned int outerprec = GET_MODE_PRECISION (int_outermode);
+      unsigned int innerprec = GET_MODE_PRECISION (int_innermode);
+      if (outerprec < innerprec)
 	{
 	  rtx tem = simplify_truncation (int_outermode, op, int_innermode);
 	  if (tem)
 	    return tem;
 	}
+      else if (outerprec > innerprec
+	       && GET_CODE (op) == TRUNCATE)
+	{
+	  /* Optimize paradoxial subreg extension of a truncate, where
+	     we can eliminate the truncation, or widen the truncation
+	     to the desired mode.  */
+	  scalar_int_mode int_origmode;
+	  rtx orig = XEXP (op, 0);
+	  if (is_a <scalar_int_mode> (GET_MODE (orig), &int_origmode))
+	    {
+	      unsigned int origprec = GET_MODE_PRECISION (int_origmode);
+	      if (outerprec < origprec)
+		return simplify_gen_unary (TRUNCATE, outermode, orig,
+					   GET_MODE (orig));
+	      else if (outerprec > origprec)
+		return lowpart_subreg (int_outermode, orig, int_origmode);
+	      else if (outermode == GET_MODE (orig))
+		return orig;
+	    }
+	}
     }
 
   /* If OP is a vector comparison and the subreg is not changing the