Fix PR rtl-optimization/87727

Message ID	3499484.ePniFpkjG6@polaris
State	New
Headers	show Return-Path: <gcc-patches-return-492881-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=kVdMKGZwn2l/Cekc /iAtdXD8KOcHVDriJm4gJZR0uhqLng13kfTNJ79d5jqoEo5yiUE5EdbFkkZKgVHq Qz3LI5rT/mREVYwTPJKXl+HuuspUZzM9cSutWGwCmcmlX8AfqlDKVpoePqZtodYA KsPjOEiKWj7E0cNGdfhjPU8aJwQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Eric Botcazou <ebotcazou@adacore.com> To: gcc-patches@gcc.gnu.org Subject: [patch] Fix PR rtl-optimization/87727 Date: Thu, 20 Dec 2018 12:41:16 +0100 Message-ID: <3499484.ePniFpkjG6@polaris> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart46830773.r8SoohAYIu" Content-Transfer-Encoding: 7Bit
Series	Fix PR rtl-optimization/87727 \| expand Fix PR rtl-optimization/87727

Eric Botcazou Dec. 20, 2018, 11:41 a.m. UTC

Hi,

this is a regression introduced on the SPARC by the somewhat controversial 
combiner change for hard registers: the compiler can no longer apply the leaf 
registers optimization to a small function so a register window is now used.

The combiner change might be an overall win, but my understanding is that it's 
dependent on the target and SPARC seems to be in the wrong basket: almost all 
changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in 
the form of additional move instructions between registers on function entry.

Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so 
the proposed fix is to re-enable hard register combining for leaf registers.

Tested on SPARC/Solaris 11, OK for the mainline?


2018-12-20  Eric Botcazou  <ebotcazou@adacore.com>

	PR rtl-optimization/87727
	* combine.c (cant_combine_insn_p): On a LEAF_REGISTERS target, combine
	again moves from leaf hard registers.

Richard Biener Dec. 20, 2018, 12:15 p.m. UTC | #1

On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
>
> Hi,
>
> this is a regression introduced on the SPARC by the somewhat controversial
> combiner change for hard registers: the compiler can no longer apply the leaf
> registers optimization to a small function so a register window is now used.
>
> The combiner change might be an overall win, but my understanding is that it's
> dependent on the target and SPARC seems to be in the wrong basket: almost all
> changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in
> the form of additional move instructions between registers on function entry.
>
> Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so
> the proposed fix is to re-enable hard register combining for leaf registers.
>
> Tested on SPARC/Solaris 11, OK for the mainline?

This only affects xtensa besides sparc so unless Segher objects this is OK.

Does this solve most of the pessimizations?

Please add a testcase if it doesn't solve existing FAILs.

Thanks,
Richard.

>
> 2018-12-20  Eric Botcazou  <ebotcazou@adacore.com>
>
>         PR rtl-optimization/87727
>         * combine.c (cant_combine_insn_p): On a LEAF_REGISTERS target, combine
>         again moves from leaf hard registers.
>
> --
> Eric Botcazou

Jakub Jelinek Dec. 20, 2018, 12:23 p.m. UTC | #2

On Thu, Dec 20, 2018 at 01:15:53PM +0100, Richard Biener wrote:
> On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
> > this is a regression introduced on the SPARC by the somewhat controversial
> > combiner change for hard registers: the compiler can no longer apply the leaf
> > registers optimization to a small function so a register window is now used.
> >
> > The combiner change might be an overall win, but my understanding is that it's
> > dependent on the target and SPARC seems to be in the wrong basket: almost all
> > changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in
> > the form of additional move instructions between registers on function entry.
> >
> > Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so
> > the proposed fix is to re-enable hard register combining for leaf registers.
> >
> > Tested on SPARC/Solaris 11, OK for the mainline?
> 
> This only affects xtensa besides sparc so unless Segher objects this is OK.
> 
> Does this solve most of the pessimizations?
> 
> Please add a testcase if it doesn't solve existing FAILs.

Generally it would be better to deal with that in RA, but if Vlad doesn't
have cycles for it right now, your hack isn't that bad.

> > 2018-12-20  Eric Botcazou  <ebotcazou@adacore.com>
> >
> >         PR rtl-optimization/87727
> >         * combine.c (cant_combine_insn_p): On a LEAF_REGISTERS target, combine
> >         again moves from leaf hard registers.

	Jakub

Segher Boessenkool Dec. 20, 2018, 7:31 p.m. UTC | #3

On Thu, Dec 20, 2018 at 01:23:57PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 20, 2018 at 01:15:53PM +0100, Richard Biener wrote:
> > On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote:
> > > this is a regression introduced on the SPARC by the somewhat controversial
> > > combiner change for hard registers: the compiler can no longer apply the leaf
> > > registers optimization to a small function so a register window is now used.
> > >
> > > The combiner change might be an overall win, but my understanding is that it's
> > > dependent on the target and SPARC seems to be in the wrong basket: almost all
> > > changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in
> > > the form of additional move instructions between registers on function entry.
> > >
> > > Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so
> > > the proposed fix is to re-enable hard register combining for leaf registers.
> > >
> > > Tested on SPARC/Solaris 11, OK for the mainline?
> > 
> > This only affects xtensa besides sparc so unless Segher objects this is OK.
> > 
> > Does this solve most of the pessimizations?
> > 
> > Please add a testcase if it doesn't solve existing FAILs.
> 
> Generally it would be better to deal with that in RA, but if Vlad doesn't
> have cycles for it right now, your hack isn't that bad.

It's not a terrible workaround, no.  It looks like it will make some asm
once again fail though?  If argument registers are forwarded to in the asm.


Segher

Eric Botcazou Dec. 20, 2018, 7:40 p.m. UTC | #4

> Does this solve most of the pessimizations?

Yes, it does.

> Please add a testcase if it doesn't solve existing FAILs.

It fixes gcc.target/sparc/overflow-2.c.

Eric Botcazou Dec. 20, 2018, 8:07 p.m. UTC | #5

> It's not a terrible workaround, no.  It looks like it will make some asm
> once again fail though?  If argument registers are forwarded to in the asm.

The combiner change looks like a big hammer for such a corner case though.

Segher Boessenkool Dec. 20, 2018, 8:26 p.m. UTC | #6

On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote:
> > It's not a terrible workaround, no.  It looks like it will make some asm
> > once again fail though?  If argument registers are forwarded to in the asm.
> 
> The combiner change looks like a big hammer for such a corner case though.

That isn't the primary goal, no; the primary goal is to get better RA (by
not having the combiner do RA's work, and doing a mediocre job of it).  It
does solve these problems though.

Segher

Peter Bergner Dec. 20, 2018, 9:30 p.m. UTC | #7

On 12/20/18 2:26 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote:
>>> It's not a terrible workaround, no.  It looks like it will make some asm
>>> once again fail though?  If argument registers are forwarded to in the asm.
>>
>> The combiner change looks like a big hammer for such a corner case though.
> 
> That isn't the primary goal, no; the primary goal is to get better RA (by
> not having the combiner do RA's work, and doing a mediocre job of it).  It
> does solve these problems though.

My recent RA patches improved conflict information by noticing when we
don't need a conflict for a copy insn between a hard reg and a pseudo
when the source operand is live past the copy.  However, conflict info
for copies between two pseudos wasn't affected because IRA/LRA computes
liveness differently for pseudos than for hard regs.  I have seen copies
between two pseudos still get to RA and we're not able to do anything
about them at the moment because we still think they conflict.

For stage1, I'd like to fix that conflict wart if I can.  I have also
wondered about adding a copy coalesce phase just before we enter RA,
which would ensure the copies are removed, instead of hoping RA assigns
the same reg to the source and destination of the copy making it a nop
that can be removed.

Peter

Jeff Law Dec. 20, 2018, 10:41 p.m. UTC | #8

On 12/20/18 2:30 PM, Peter Bergner wrote:
> On 12/20/18 2:26 PM, Segher Boessenkool wrote:
>> On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote:
>>>> It's not a terrible workaround, no.  It looks like it will make some asm
>>>> once again fail though?  If argument registers are forwarded to in the asm.
>>>
>>> The combiner change looks like a big hammer for such a corner case though.
>>
>> That isn't the primary goal, no; the primary goal is to get better RA (by
>> not having the combiner do RA's work, and doing a mediocre job of it).  It
>> does solve these problems though.
> 
> My recent RA patches improved conflict information by noticing when we
> don't need a conflict for a copy insn between a hard reg and a pseudo
> when the source operand is live past the copy.  However, conflict info
> for copies between two pseudos wasn't affected because IRA/LRA computes
> liveness differently for pseudos than for hard regs.  I have seen copies
> between two pseudos still get to RA and we're not able to do anything
> about them at the moment because we still think they conflict.
> 
> For stage1, I'd like to fix that conflict wart if I can.  I have also
> wondered about adding a copy coalesce phase just before we enter RA,
> which would ensure the copies are removed, instead of hoping RA assigns
> the same reg to the source and destination of the copy making it a nop
> that can be removed.
The difficulty with coalescing is that if you get too aggressive then
you end up removing degrees of freedom from the allocator and you can
easily make the final results worse.

Jeff

Peter Bergner Dec. 20, 2018, 11:14 p.m. UTC | #9

On 12/20/18 4:41 PM, Jeff Law wrote:
> On 12/20/18 2:30 PM, Peter Bergner wrote:
>> For stage1, I'd like to fix that conflict wart if I can.  I have also
>> wondered about adding a copy coalesce phase just before we enter RA,
>> which would ensure the copies are removed, instead of hoping RA assigns
>> the same reg to the source and destination of the copy making it a nop
>> that can be removed.
> The difficulty with coalescing is that if you get too aggressive then
> you end up removing degrees of freedom from the allocator and you can
> easily make the final results worse.

I agree, but being too aggressive leading to bad decisions/code is
true for a lot of optimizations. :-)   I do plan on first attacking
the conservative conflict info for pseudos first and seeing what
that buys us before attempting any coalescing.

As for removing degrees of freedom for the allocator, sometimes that can
be a good thing, if it can makes the allocator simpler.  For example, I
think we have forced the allocator to do too much by not only being an RA,
but being an instruction selector as well.  Doing both RA and instruction
selection at the same time makes everything very complicated and I think
we probably don't compute allocation costs correctly, since we seem to
calculate costs on a per alternative per insn basis and I don't think we
ever see what the ramifications of using an alternative in one insn
has on the costs of another alternative in another insn.  Sometimes using
the cheapest alternative in one insn and the cheapest alternative in
another insn can lead us into a situation that requires spilling to
resolve the conflicting choices.

I've wondered if running something like lra_constraints() (but using
pseudos for fixups rather than hard regs) early in the rtl passes as
a pseudo instruction selection pass wouldn't make things easier for
the following passes like RA, etc?

Peter

Vladimir Makarov Dec. 21, 2018, 3:24 p.m. UTC | #10

On 12/20/2018 06:14 PM, Peter Bergner wrote:
> On 12/20/18 4:41 PM, Jeff Law wrote:
>> On 12/20/18 2:30 PM, Peter Bergner wrote:
>>> For stage1, I'd like to fix that conflict wart if I can.  I have also
>>> wondered about adding a copy coalesce phase just before we enter RA,
>>> which would ensure the copies are removed, instead of hoping RA assigns
>>> the same reg to the source and destination of the copy making it a nop
>>> that can be removed.
>> The difficulty with coalescing is that if you get too aggressive then
>> you end up removing degrees of freedom from the allocator and you can
>> easily make the final results worse.
> I agree, but being too aggressive leading to bad decisions/code is
> true for a lot of optimizations. :-)   I do plan on first attacking
> the conservative conflict info for pseudos first and seeing what
> that buys us before attempting any coalescing.
When I started to work on IRA, I've tried several coalescing techniques 
(i recall only conservative, iterative and optimistic ones).  The 
results were not promising.  But it was very long time ago,  my major 
target was i686 that time and there were no accurate conflict 
calculations for irregular file registers.  So may be it will work in 
current environment and in a different implementation.

Currently IRA has coalescing only for spilled pseudos after coloring 
(because mem<->mem moves are very expensive).  LRA has the same technique.

> As for removing degrees of freedom for the allocator, sometimes that can
> be a good thing, if it can makes the allocator simpler.  For example, I
> think we have forced the allocator to do too much by not only being an RA,
> but being an instruction selector as well.  Doing both RA and instruction
> selection at the same time makes everything very complicated and I think
> we probably don't compute allocation costs correctly, since we seem to
> calculate costs on a per alternative per insn basis and I don't think we
> ever see what the ramifications of using an alternative in one insn
> has on the costs of another alternative in another insn.  Sometimes using
> the cheapest alternative in one insn and the cheapest alternative in
> another insn can lead us into a situation that requires spilling to
> resolve the conflicting choices.
   I am completely agree.  The big remaining part to modernize GCC is 
code selection.  I believe LLVM has a big advantage in this area over 
GCC.  A modern approach could make RA much simpler.  But it is a very 
big job involving changes in machine descriptions (a lot of them).

   I don't mean machine description in IBURG style.  That would be a 
huge, enormous job requiring a lot of expertise part of which is lost 
for some targets (i was thinking about to start this jobs several times 
but gave up when I saw how many efforts it would take, it would be even 
a bigger job that writing IRA/LRA).

   I am just saying that you need at least have cost for each insn 
alternative (may be sub-targets).  Although some approximation can be 
possible (like insns number generated from the alternative or even their 
size).

   There are although some smaller projects in this direction.  For 
example, I tried to use code selection in register cost calculation (the 
code on ira-select branch).  The algorithm is based on choosing 
alternative for each insns first and then calculates costs and register 
classes for pseudos involved in the insn.  The chosen alternatives could 
be propagated later to LRA (this work even did not started yet).  The 
cost of each insn alternative (if we add them in the future in md files) 
could be easily integrated in the algorithm.

   Unfortunately the algorithm did not improve SPEC2006 for x86-64 
(i7-8700k) in overall although one benchmark was improved by about 5% if 
I remember this correctly.  But modern Intel CPUs are very insensitive 
to optimizations (they are complicated black boxes which do own 
optimizations and anekdotically i saw code when adding an additional 
move sped up the code a lot).  May be the algorithm will have better 
results on other targets (power or aarch64).  I never tried other targets.
> I've wondered if running something like lra_constraints() (but using
> pseudos for fixups rather than hard regs) early in the rtl passes as
> a pseudo instruction selection pass wouldn't make things easier for
> the following passes like RA, etc?
>
I think it might. As wrote we could propagate the above algorithm 
decision to LRA.

Peter, also if you are interesting to do RA work, there is another 
problem which is to implement sub-register level conflict calculations 
in LRA.  Currently, IRA has a simple subregister level conflict 
calculation (see allocno objects) and in a case of sub-register presence 
IRA and LRA decisions are different and this results in worse code 
generations (there are some PRs for this).  It would be also a big RA 
project to do.

Richard Biener Dec. 21, 2018, 3:55 p.m. UTC | #11

On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>
>
> On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > On 12/20/18 4:41 PM, Jeff Law wrote:
> >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> >>> For stage1, I'd like to fix that conflict wart if I can.  I have also
> >>> wondered about adding a copy coalesce phase just before we enter RA,
> >>> which would ensure the copies are removed, instead of hoping RA assigns
> >>> the same reg to the source and destination of the copy making it a nop
> >>> that can be removed.
> >> The difficulty with coalescing is that if you get too aggressive then
> >> you end up removing degrees of freedom from the allocator and you can
> >> easily make the final results worse.
> > I agree, but being too aggressive leading to bad decisions/code is
> > true for a lot of optimizations. :-)   I do plan on first attacking
> > the conservative conflict info for pseudos first and seeing what
> > that buys us before attempting any coalescing.
> When I started to work on IRA, I've tried several coalescing techniques
> (i recall only conservative, iterative and optimistic ones).  The
> results were not promising.  But it was very long time ago,  my major
> target was i686 that time and there were no accurate conflict
> calculations for irregular file registers.  So may be it will work in
> current environment and in a different implementation.
>
> Currently IRA has coalescing only for spilled pseudos after coloring
> (because mem<->mem moves are very expensive).  LRA has the same technique.
>
> > As for removing degrees of freedom for the allocator, sometimes that can
> > be a good thing, if it can makes the allocator simpler.  For example, I
> > think we have forced the allocator to do too much by not only being an RA,
> > but being an instruction selector as well.  Doing both RA and instruction
> > selection at the same time makes everything very complicated and I think
> > we probably don't compute allocation costs correctly, since we seem to
> > calculate costs on a per alternative per insn basis and I don't think we
> > ever see what the ramifications of using an alternative in one insn
> > has on the costs of another alternative in another insn.  Sometimes using
> > the cheapest alternative in one insn and the cheapest alternative in
> > another insn can lead us into a situation that requires spilling to
> > resolve the conflicting choices.
>    I am completely agree.  The big remaining part to modernize GCC is
> code selection.  I believe LLVM has a big advantage in this area over
> GCC.  A modern approach could make RA much simpler.  But it is a very
> big job involving changes in machine descriptions (a lot of them).
>
>    I don't mean machine description in IBURG style.  That would be a
> huge, enormous job requiring a lot of expertise part of which is lost
> for some targets (i was thinking about to start this jobs several times
> but gave up when I saw how many efforts it would take, it would be even
> a bigger job that writing IRA/LRA).
>
>    I am just saying that you need at least have cost for each insn
> alternative (may be sub-targets).  Although some approximation can be
> possible (like insns number generated from the alternative or even their
> size).
>
>    There are although some smaller projects in this direction.  For
> example, I tried to use code selection in register cost calculation (the
> code on ira-select branch).  The algorithm is based on choosing
> alternative for each insns first and then calculates costs and register
> classes for pseudos involved in the insn.  The chosen alternatives could
> be propagated later to LRA (this work even did not started yet).  The
> cost of each insn alternative (if we add them in the future in md files)
> could be easily integrated in the algorithm.
>
>    Unfortunately the algorithm did not improve SPEC2006 for x86-64
> (i7-8700k) in overall although one benchmark was improved by about 5% if
> I remember this correctly.  But modern Intel CPUs are very insensitive
> to optimizations (they are complicated black boxes which do own
> optimizations and anekdotically i saw code when adding an additional
> move sped up the code a lot).  May be the algorithm will have better
> results on other targets (power or aarch64).  I never tried other targets.
> > I've wondered if running something like lra_constraints() (but using
> > pseudos for fixups rather than hard regs) early in the rtl passes as
> > a pseudo instruction selection pass wouldn't make things easier for
> > the following passes like RA, etc?
> >
> I think it might. As wrote we could propagate the above algorithm
> decision to LRA.
>
> Peter, also if you are interesting to do RA work, there is another
> problem which is to implement sub-register level conflict calculations
> in LRA.  Currently, IRA has a simple subregister level conflict
> calculation (see allocno objects) and in a case of sub-register presence
> IRA and LRA decisions are different and this results in worse code
> generations (there are some PRs for this).  It would be also a big RA
> project to do.

A further away (in pass distance) but maybe related project is to
replace the current "instruction selection" (I'm talking about RTL
expansion) with a scheme that works on (GIMPLE) SSA.  My
rough idea for prototyping pieces would be to first do this
completely on GIMPLE by replacing a "instruction" by
a GIMPLE asm with an "RTL" body (well, that doesn't have to
be explicit, it just needs to remember the insn chosen). The
available patterns are readily available in the .md files, we
just need some GIMPLE <-> RTL translation of the operations.

In the end this would do away with our named patterns
for expansion purposes.

Richard.

>

Segher Boessenkool Dec. 21, 2018, 4:25 p.m. UTC | #12

On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
> > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> >    I am just saying that you need at least have cost for each insn
> > alternative (may be sub-targets).  Although some approximation can be
> > possible (like insns number generated from the alternative or even their
> > size).

For RISC targets, most instructions have exactly the same cost (and all
have the same size, or just a few sizes if you look at thumb etc.)

> A further away (in pass distance) but maybe related project is to
> replace the current "instruction selection" (I'm talking about RTL
> expansion)

In current GCC the instruction selection is expand+combine really, and
more the latter even, for well-written backends anyway.  Most "smarts"
expand does does only get in the way, even.

> with a scheme that works on (GIMPLE) SSA.  My
> rough idea for prototyping pieces would be to first do this
> completely on GIMPLE by replacing a "instruction" by
> a GIMPLE asm with an "RTL" body (well, that doesn't have to
> be explicit, it just needs to remember the insn chosen). The
> available patterns are readily available in the .md files, we
> just need some GIMPLE <-> RTL translation of the operations.
> 
> In the end this would do away with our named patterns
> for expansion purposes.

That sounds nice :-)

Do you see some way we can transition to such a scheme bit by bit, or
will there be a flag day?


Segher

Richard Biener Dec. 21, 2018, 5:35 p.m. UTC | #13

On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
> > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > >    I am just saying that you need at least have cost for each insn
> > > alternative (may be sub-targets).  Although some approximation can be
> > > possible (like insns number generated from the alternative or even their
> > > size).
>
> For RISC targets, most instructions have exactly the same cost (and all
> have the same size, or just a few sizes if you look at thumb etc.)
>
> > A further away (in pass distance) but maybe related project is to
> > replace the current "instruction selection" (I'm talking about RTL
> > expansion)
>
> In current GCC the instruction selection is expand+combine really, and
> more the latter even, for well-written backends anyway.  Most "smarts"
> expand does does only get in the way, even.
>
> > with a scheme that works on (GIMPLE) SSA.  My
> > rough idea for prototyping pieces would be to first do this
> > completely on GIMPLE by replacing a "instruction" by
> > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > be explicit, it just needs to remember the insn chosen). The
> > available patterns are readily available in the .md files, we
> > just need some GIMPLE <-> RTL translation of the operations.
> >
> > In the end this would do away with our named patterns
> > for expansion purposes.
>
> That sounds nice :-)
>
> Do you see some way we can transition to such a scheme bit by bit, or
> will there be a flag day?

Well, we could do a "pre-expand" GIMPLE instruction selection
phase doing instruction selection on (parts) of the IL either
substituting internal-function calls and use direct-optabs for
later RTL expansion (that would then introduce target-specific
internal functions) or try using the suggested scheme^Whack
of using a GIMPLE ASM kind with instead of the asm text
something that RTL expansion can work with.  The ASM approach
has the advantage that we could put in constraints to guide RTL
expansion, avoiding more "magic" (aka recog) there.

Not sure what the hard part here is, but I guess it might be
mapping of GIMPLE SSA to .md file define-insn patterns.

Or maybe not.  As said, it should be reasonable easy to
handle it for the standard named patterns which is where
you could prototype the plumbing w/o doing the .md file
parsing and matcher auto-generation.

Richard.

>
> Segher

Richard Biener Dec. 21, 2018, 5:36 p.m. UTC | #14

On Fri, Dec 21, 2018 at 6:35 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
> > > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > > >    I am just saying that you need at least have cost for each insn
> > > > alternative (may be sub-targets).  Although some approximation can be
> > > > possible (like insns number generated from the alternative or even their
> > > > size).
> >
> > For RISC targets, most instructions have exactly the same cost (and all
> > have the same size, or just a few sizes if you look at thumb etc.)
> >
> > > A further away (in pass distance) but maybe related project is to
> > > replace the current "instruction selection" (I'm talking about RTL
> > > expansion)
> >
> > In current GCC the instruction selection is expand+combine really, and
> > more the latter even, for well-written backends anyway.  Most "smarts"
> > expand does does only get in the way, even.
> >
> > > with a scheme that works on (GIMPLE) SSA.  My
> > > rough idea for prototyping pieces would be to first do this
> > > completely on GIMPLE by replacing a "instruction" by
> > > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > > be explicit, it just needs to remember the insn chosen). The
> > > available patterns are readily available in the .md files, we
> > > just need some GIMPLE <-> RTL translation of the operations.
> > >
> > > In the end this would do away with our named patterns
> > > for expansion purposes.
> >
> > That sounds nice :-)
> >
> > Do you see some way we can transition to such a scheme bit by bit, or
> > will there be a flag day?
>
> Well, we could do a "pre-expand" GIMPLE instruction selection
> phase doing instruction selection on (parts) of the IL either
> substituting internal-function calls and use direct-optabs for
> later RTL expansion (that would then introduce target-specific
> internal functions) or try using the suggested scheme^Whack
> of using a GIMPLE ASM kind with instead of the asm text
> something that RTL expansion can work with.  The ASM approach
> has the advantage that we could put in constraints to guide RTL
> expansion, avoiding more "magic" (aka recog) there.
>
> Not sure what the hard part here is, but I guess it might be
> mapping of GIMPLE SSA to .md file define-insn patterns.
>
> Or maybe not.  As said, it should be reasonable easy to
> handle it for the standard named patterns which is where
> you could prototype the plumbing w/o doing the .md file
> parsing and matcher auto-generation.

To expand on this I was thinking about doing such partial
transition to get rid of TER - all the cases TER now is
required for would be "early instruction selected".

Richard.

> Richard.
>
> >
> > Segher

Segher Boessenkool Dec. 21, 2018, 5:59 p.m. UTC | #15

On Fri, Dec 21, 2018 at 06:35:14PM +0100, Richard Biener wrote:
> On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote:
> > > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > > >    I am just saying that you need at least have cost for each insn
> > > > alternative (may be sub-targets).  Although some approximation can be
> > > > possible (like insns number generated from the alternative or even their
> > > > size).
> >
> > For RISC targets, most instructions have exactly the same cost (and all
> > have the same size, or just a few sizes if you look at thumb etc.)
> >
> > > A further away (in pass distance) but maybe related project is to
> > > replace the current "instruction selection" (I'm talking about RTL
> > > expansion)
> >
> > In current GCC the instruction selection is expand+combine really, and
> > more the latter even, for well-written backends anyway.  Most "smarts"
> > expand does does only get in the way, even.
> >
> > > with a scheme that works on (GIMPLE) SSA.  My
> > > rough idea for prototyping pieces would be to first do this
> > > completely on GIMPLE by replacing a "instruction" by
> > > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > > be explicit, it just needs to remember the insn chosen). The
> > > available patterns are readily available in the .md files, we
> > > just need some GIMPLE <-> RTL translation of the operations.
> > >
> > > In the end this would do away with our named patterns
> > > for expansion purposes.
> >
> > That sounds nice :-)
> >
> > Do you see some way we can transition to such a scheme bit by bit, or
> > will there be a flag day?
> 
> Well, we could do a "pre-expand" GIMPLE instruction selection
> phase doing instruction selection on (parts) of the IL either
> substituting internal-function calls and use direct-optabs for
> later RTL expansion (that would then introduce target-specific
> internal functions) or try using the suggested scheme^Whack
> of using a GIMPLE ASM kind with instead of the asm text
> something that RTL expansion can work with.  The ASM approach
> has the advantage that we could put in constraints to guide RTL
> expansion, avoiding more "magic" (aka recog) there.

Hrm, so a special kind of GIMPLE ASM, let's call it "GIMPLE RTL"...
That sounds good yes!  As an intermediate, of course :-)

> Not sure what the hard part here is, but I guess it might be
> mapping of GIMPLE SSA to .md file define-insn patterns.

Expand does so *much* currently.  Maybe it shouldn't.  But then we
need to move much of what it does to a better place, because not all
of it is useless.

> Or maybe not.  As said, it should be reasonable easy to
> handle it for the standard named patterns which is where
> you could prototype the plumbing w/o doing the .md file
> parsing and matcher auto-generation.


Segher

Peter Bergner Dec. 28, 2018, 4:13 p.m. UTC | #16

On 12/21/18 9:24 AM, Vladimir Makarov wrote:
> Peter, also if you are interesting to do RA work, there is another problem
> which is to implement sub-register level conflict calculations in LRA.
> Currently, IRA has a simple subregister level conflict calculation (see
> allocno objects) and in a case of sub-register presence IRA and LRA decisions
> are different and this results in worse code generations (there are some PRs
> for this).  It would be also a big RA project to do.

Can you point me to the PRs?  Thanks.

Peter

Vladimir Makarov Jan. 4, 2019, 4:52 p.m. UTC | #17

On 12/28/2018 11:13 AM, Peter Bergner wrote:
> On 12/21/18 9:24 AM, Vladimir Makarov wrote:
>> Peter, also if you are interesting to do RA work, there is another problem
>> which is to implement sub-register level conflict calculations in LRA.
>> Currently, IRA has a simple subregister level conflict calculation (see
>> allocno objects) and in a case of sub-register presence IRA and LRA decisions
>> are different and this results in worse code generations (there are some PRs
>> for this).  It would be also a big RA project to do.
> Can you point me to the PRs?  Thanks.
>
Peter, sorry for the answer delay.  I am still on vacation.

Here is a recent PR I remember:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84757

Fix PR rtl-optimization/87727

Commit Message

Comments

Patch