Message ID | 3499484.ePniFpkjG6@polaris |
---|---|
State | New |
Headers | show |
Series | Fix PR rtl-optimization/87727 | expand |
On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote: > > Hi, > > this is a regression introduced on the SPARC by the somewhat controversial > combiner change for hard registers: the compiler can no longer apply the leaf > registers optimization to a small function so a register window is now used. > > The combiner change might be an overall win, but my understanding is that it's > dependent on the target and SPARC seems to be in the wrong basket: almost all > changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in > the form of additional move instructions between registers on function entry. > > Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so > the proposed fix is to re-enable hard register combining for leaf registers. > > Tested on SPARC/Solaris 11, OK for the mainline? This only affects xtensa besides sparc so unless Segher objects this is OK. Does this solve most of the pessimizations? Please add a testcase if it doesn't solve existing FAILs. Thanks, Richard. > > 2018-12-20 Eric Botcazou <ebotcazou@adacore.com> > > PR rtl-optimization/87727 > * combine.c (cant_combine_insn_p): On a LEAF_REGISTERS target, combine > again moves from leaf hard registers. > > -- > Eric Botcazou
On Thu, Dec 20, 2018 at 01:15:53PM +0100, Richard Biener wrote: > On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote: > > this is a regression introduced on the SPARC by the somewhat controversial > > combiner change for hard registers: the compiler can no longer apply the leaf > > registers optimization to a small function so a register window is now used. > > > > The combiner change might be an overall win, but my understanding is that it's > > dependent on the target and SPARC seems to be in the wrong basket: almost all > > changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in > > the form of additional move instructions between registers on function entry. > > > > Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so > > the proposed fix is to re-enable hard register combining for leaf registers. > > > > Tested on SPARC/Solaris 11, OK for the mainline? > > This only affects xtensa besides sparc so unless Segher objects this is OK. > > Does this solve most of the pessimizations? > > Please add a testcase if it doesn't solve existing FAILs. Generally it would be better to deal with that in RA, but if Vlad doesn't have cycles for it right now, your hack isn't that bad. > > 2018-12-20 Eric Botcazou <ebotcazou@adacore.com> > > > > PR rtl-optimization/87727 > > * combine.c (cant_combine_insn_p): On a LEAF_REGISTERS target, combine > > again moves from leaf hard registers. Jakub
On Thu, Dec 20, 2018 at 01:23:57PM +0100, Jakub Jelinek wrote: > On Thu, Dec 20, 2018 at 01:15:53PM +0100, Richard Biener wrote: > > On Thu, Dec 20, 2018 at 12:43 PM Eric Botcazou <ebotcazou@adacore.com> wrote: > > > this is a regression introduced on the SPARC by the somewhat controversial > > > combiner change for hard registers: the compiler can no longer apply the leaf > > > registers optimization to a small function so a register window is now used. > > > > > > The combiner change might be an overall win, but my understanding is that it's > > > dependent on the target and SPARC seems to be in the wrong basket: almost all > > > changes to the gcc.c-torture/compile testsuite at -O2 are pessimizations in > > > the form of additional move instructions between registers on function entry. > > > > > > Clearly that's counter-productive for a LEAF_REGISTERS target like SPARC so > > > the proposed fix is to re-enable hard register combining for leaf registers. > > > > > > Tested on SPARC/Solaris 11, OK for the mainline? > > > > This only affects xtensa besides sparc so unless Segher objects this is OK. > > > > Does this solve most of the pessimizations? > > > > Please add a testcase if it doesn't solve existing FAILs. > > Generally it would be better to deal with that in RA, but if Vlad doesn't > have cycles for it right now, your hack isn't that bad. It's not a terrible workaround, no. It looks like it will make some asm once again fail though? If argument registers are forwarded to in the asm. Segher
> Does this solve most of the pessimizations? Yes, it does. > Please add a testcase if it doesn't solve existing FAILs. It fixes gcc.target/sparc/overflow-2.c.
> It's not a terrible workaround, no. It looks like it will make some asm > once again fail though? If argument registers are forwarded to in the asm. The combiner change looks like a big hammer for such a corner case though.
On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote: > > It's not a terrible workaround, no. It looks like it will make some asm > > once again fail though? If argument registers are forwarded to in the asm. > > The combiner change looks like a big hammer for such a corner case though. That isn't the primary goal, no; the primary goal is to get better RA (by not having the combiner do RA's work, and doing a mediocre job of it). It does solve these problems though. Segher
On 12/20/18 2:26 PM, Segher Boessenkool wrote: > On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote: >>> It's not a terrible workaround, no. It looks like it will make some asm >>> once again fail though? If argument registers are forwarded to in the asm. >> >> The combiner change looks like a big hammer for such a corner case though. > > That isn't the primary goal, no; the primary goal is to get better RA (by > not having the combiner do RA's work, and doing a mediocre job of it). It > does solve these problems though. My recent RA patches improved conflict information by noticing when we don't need a conflict for a copy insn between a hard reg and a pseudo when the source operand is live past the copy. However, conflict info for copies between two pseudos wasn't affected because IRA/LRA computes liveness differently for pseudos than for hard regs. I have seen copies between two pseudos still get to RA and we're not able to do anything about them at the moment because we still think they conflict. For stage1, I'd like to fix that conflict wart if I can. I have also wondered about adding a copy coalesce phase just before we enter RA, which would ensure the copies are removed, instead of hoping RA assigns the same reg to the source and destination of the copy making it a nop that can be removed. Peter
On 12/20/18 2:30 PM, Peter Bergner wrote: > On 12/20/18 2:26 PM, Segher Boessenkool wrote: >> On Thu, Dec 20, 2018 at 09:07:41PM +0100, Eric Botcazou wrote: >>>> It's not a terrible workaround, no. It looks like it will make some asm >>>> once again fail though? If argument registers are forwarded to in the asm. >>> >>> The combiner change looks like a big hammer for such a corner case though. >> >> That isn't the primary goal, no; the primary goal is to get better RA (by >> not having the combiner do RA's work, and doing a mediocre job of it). It >> does solve these problems though. > > My recent RA patches improved conflict information by noticing when we > don't need a conflict for a copy insn between a hard reg and a pseudo > when the source operand is live past the copy. However, conflict info > for copies between two pseudos wasn't affected because IRA/LRA computes > liveness differently for pseudos than for hard regs. I have seen copies > between two pseudos still get to RA and we're not able to do anything > about them at the moment because we still think they conflict. > > For stage1, I'd like to fix that conflict wart if I can. I have also > wondered about adding a copy coalesce phase just before we enter RA, > which would ensure the copies are removed, instead of hoping RA assigns > the same reg to the source and destination of the copy making it a nop > that can be removed. The difficulty with coalescing is that if you get too aggressive then you end up removing degrees of freedom from the allocator and you can easily make the final results worse. Jeff
On 12/20/18 4:41 PM, Jeff Law wrote: > On 12/20/18 2:30 PM, Peter Bergner wrote: >> For stage1, I'd like to fix that conflict wart if I can. I have also >> wondered about adding a copy coalesce phase just before we enter RA, >> which would ensure the copies are removed, instead of hoping RA assigns >> the same reg to the source and destination of the copy making it a nop >> that can be removed. > The difficulty with coalescing is that if you get too aggressive then > you end up removing degrees of freedom from the allocator and you can > easily make the final results worse. I agree, but being too aggressive leading to bad decisions/code is true for a lot of optimizations. :-) I do plan on first attacking the conservative conflict info for pseudos first and seeing what that buys us before attempting any coalescing. As for removing degrees of freedom for the allocator, sometimes that can be a good thing, if it can makes the allocator simpler. For example, I think we have forced the allocator to do too much by not only being an RA, but being an instruction selector as well. Doing both RA and instruction selection at the same time makes everything very complicated and I think we probably don't compute allocation costs correctly, since we seem to calculate costs on a per alternative per insn basis and I don't think we ever see what the ramifications of using an alternative in one insn has on the costs of another alternative in another insn. Sometimes using the cheapest alternative in one insn and the cheapest alternative in another insn can lead us into a situation that requires spilling to resolve the conflicting choices. I've wondered if running something like lra_constraints() (but using pseudos for fixups rather than hard regs) early in the rtl passes as a pseudo instruction selection pass wouldn't make things easier for the following passes like RA, etc? Peter
On 12/20/2018 06:14 PM, Peter Bergner wrote: > On 12/20/18 4:41 PM, Jeff Law wrote: >> On 12/20/18 2:30 PM, Peter Bergner wrote: >>> For stage1, I'd like to fix that conflict wart if I can. I have also >>> wondered about adding a copy coalesce phase just before we enter RA, >>> which would ensure the copies are removed, instead of hoping RA assigns >>> the same reg to the source and destination of the copy making it a nop >>> that can be removed. >> The difficulty with coalescing is that if you get too aggressive then >> you end up removing degrees of freedom from the allocator and you can >> easily make the final results worse. > I agree, but being too aggressive leading to bad decisions/code is > true for a lot of optimizations. :-) I do plan on first attacking > the conservative conflict info for pseudos first and seeing what > that buys us before attempting any coalescing. When I started to work on IRA, I've tried several coalescing techniques (i recall only conservative, iterative and optimistic ones). The results were not promising. But it was very long time ago, my major target was i686 that time and there were no accurate conflict calculations for irregular file registers. So may be it will work in current environment and in a different implementation. Currently IRA has coalescing only for spilled pseudos after coloring (because mem<->mem moves are very expensive). LRA has the same technique. > As for removing degrees of freedom for the allocator, sometimes that can > be a good thing, if it can makes the allocator simpler. For example, I > think we have forced the allocator to do too much by not only being an RA, > but being an instruction selector as well. Doing both RA and instruction > selection at the same time makes everything very complicated and I think > we probably don't compute allocation costs correctly, since we seem to > calculate costs on a per alternative per insn basis and I don't think we > ever see what the ramifications of using an alternative in one insn > has on the costs of another alternative in another insn. Sometimes using > the cheapest alternative in one insn and the cheapest alternative in > another insn can lead us into a situation that requires spilling to > resolve the conflicting choices. I am completely agree. The big remaining part to modernize GCC is code selection. I believe LLVM has a big advantage in this area over GCC. A modern approach could make RA much simpler. But it is a very big job involving changes in machine descriptions (a lot of them). I don't mean machine description in IBURG style. That would be a huge, enormous job requiring a lot of expertise part of which is lost for some targets (i was thinking about to start this jobs several times but gave up when I saw how many efforts it would take, it would be even a bigger job that writing IRA/LRA). I am just saying that you need at least have cost for each insn alternative (may be sub-targets). Although some approximation can be possible (like insns number generated from the alternative or even their size). There are although some smaller projects in this direction. For example, I tried to use code selection in register cost calculation (the code on ira-select branch). The algorithm is based on choosing alternative for each insns first and then calculates costs and register classes for pseudos involved in the insn. The chosen alternatives could be propagated later to LRA (this work even did not started yet). The cost of each insn alternative (if we add them in the future in md files) could be easily integrated in the algorithm. Unfortunately the algorithm did not improve SPEC2006 for x86-64 (i7-8700k) in overall although one benchmark was improved by about 5% if I remember this correctly. But modern Intel CPUs are very insensitive to optimizations (they are complicated black boxes which do own optimizations and anekdotically i saw code when adding an additional move sped up the code a lot). May be the algorithm will have better results on other targets (power or aarch64). I never tried other targets. > I've wondered if running something like lra_constraints() (but using > pseudos for fixups rather than hard regs) early in the rtl passes as > a pseudo instruction selection pass wouldn't make things easier for > the following passes like RA, etc? > I think it might. As wrote we could propagate the above algorithm decision to LRA. Peter, also if you are interesting to do RA work, there is another problem which is to implement sub-register level conflict calculations in LRA. Currently, IRA has a simple subregister level conflict calculation (see allocno objects) and in a case of sub-register presence IRA and LRA decisions are different and this results in worse code generations (there are some PRs for this). It would be also a big RA project to do.
On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote: > > > > On 12/20/2018 06:14 PM, Peter Bergner wrote: > > On 12/20/18 4:41 PM, Jeff Law wrote: > >> On 12/20/18 2:30 PM, Peter Bergner wrote: > >>> For stage1, I'd like to fix that conflict wart if I can. I have also > >>> wondered about adding a copy coalesce phase just before we enter RA, > >>> which would ensure the copies are removed, instead of hoping RA assigns > >>> the same reg to the source and destination of the copy making it a nop > >>> that can be removed. > >> The difficulty with coalescing is that if you get too aggressive then > >> you end up removing degrees of freedom from the allocator and you can > >> easily make the final results worse. > > I agree, but being too aggressive leading to bad decisions/code is > > true for a lot of optimizations. :-) I do plan on first attacking > > the conservative conflict info for pseudos first and seeing what > > that buys us before attempting any coalescing. > When I started to work on IRA, I've tried several coalescing techniques > (i recall only conservative, iterative and optimistic ones). The > results were not promising. But it was very long time ago, my major > target was i686 that time and there were no accurate conflict > calculations for irregular file registers. So may be it will work in > current environment and in a different implementation. > > Currently IRA has coalescing only for spilled pseudos after coloring > (because mem<->mem moves are very expensive). LRA has the same technique. > > > As for removing degrees of freedom for the allocator, sometimes that can > > be a good thing, if it can makes the allocator simpler. For example, I > > think we have forced the allocator to do too much by not only being an RA, > > but being an instruction selector as well. Doing both RA and instruction > > selection at the same time makes everything very complicated and I think > > we probably don't compute allocation costs correctly, since we seem to > > calculate costs on a per alternative per insn basis and I don't think we > > ever see what the ramifications of using an alternative in one insn > > has on the costs of another alternative in another insn. Sometimes using > > the cheapest alternative in one insn and the cheapest alternative in > > another insn can lead us into a situation that requires spilling to > > resolve the conflicting choices. > I am completely agree. The big remaining part to modernize GCC is > code selection. I believe LLVM has a big advantage in this area over > GCC. A modern approach could make RA much simpler. But it is a very > big job involving changes in machine descriptions (a lot of them). > > I don't mean machine description in IBURG style. That would be a > huge, enormous job requiring a lot of expertise part of which is lost > for some targets (i was thinking about to start this jobs several times > but gave up when I saw how many efforts it would take, it would be even > a bigger job that writing IRA/LRA). > > I am just saying that you need at least have cost for each insn > alternative (may be sub-targets). Although some approximation can be > possible (like insns number generated from the alternative or even their > size). > > There are although some smaller projects in this direction. For > example, I tried to use code selection in register cost calculation (the > code on ira-select branch). The algorithm is based on choosing > alternative for each insns first and then calculates costs and register > classes for pseudos involved in the insn. The chosen alternatives could > be propagated later to LRA (this work even did not started yet). The > cost of each insn alternative (if we add them in the future in md files) > could be easily integrated in the algorithm. > > Unfortunately the algorithm did not improve SPEC2006 for x86-64 > (i7-8700k) in overall although one benchmark was improved by about 5% if > I remember this correctly. But modern Intel CPUs are very insensitive > to optimizations (they are complicated black boxes which do own > optimizations and anekdotically i saw code when adding an additional > move sped up the code a lot). May be the algorithm will have better > results on other targets (power or aarch64). I never tried other targets. > > I've wondered if running something like lra_constraints() (but using > > pseudos for fixups rather than hard regs) early in the rtl passes as > > a pseudo instruction selection pass wouldn't make things easier for > > the following passes like RA, etc? > > > I think it might. As wrote we could propagate the above algorithm > decision to LRA. > > Peter, also if you are interesting to do RA work, there is another > problem which is to implement sub-register level conflict calculations > in LRA. Currently, IRA has a simple subregister level conflict > calculation (see allocno objects) and in a case of sub-register presence > IRA and LRA decisions are different and this results in worse code > generations (there are some PRs for this). It would be also a big RA > project to do. A further away (in pass distance) but maybe related project is to replace the current "instruction selection" (I'm talking about RTL expansion) with a scheme that works on (GIMPLE) SSA. My rough idea for prototyping pieces would be to first do this completely on GIMPLE by replacing a "instruction" by a GIMPLE asm with an "RTL" body (well, that doesn't have to be explicit, it just needs to remember the insn chosen). The available patterns are readily available in the .md files, we just need some GIMPLE <-> RTL translation of the operations. In the end this would do away with our named patterns for expansion purposes. Richard. >
On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote: > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote: > > On 12/20/2018 06:14 PM, Peter Bergner wrote: > > > On 12/20/18 4:41 PM, Jeff Law wrote: > > >> On 12/20/18 2:30 PM, Peter Bergner wrote: > > I am just saying that you need at least have cost for each insn > > alternative (may be sub-targets). Although some approximation can be > > possible (like insns number generated from the alternative or even their > > size). For RISC targets, most instructions have exactly the same cost (and all have the same size, or just a few sizes if you look at thumb etc.) > A further away (in pass distance) but maybe related project is to > replace the current "instruction selection" (I'm talking about RTL > expansion) In current GCC the instruction selection is expand+combine really, and more the latter even, for well-written backends anyway. Most "smarts" expand does does only get in the way, even. > with a scheme that works on (GIMPLE) SSA. My > rough idea for prototyping pieces would be to first do this > completely on GIMPLE by replacing a "instruction" by > a GIMPLE asm with an "RTL" body (well, that doesn't have to > be explicit, it just needs to remember the insn chosen). The > available patterns are readily available in the .md files, we > just need some GIMPLE <-> RTL translation of the operations. > > In the end this would do away with our named patterns > for expansion purposes. That sounds nice :-) Do you see some way we can transition to such a scheme bit by bit, or will there be a flag day? Segher
On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote: > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote: > > > On 12/20/2018 06:14 PM, Peter Bergner wrote: > > > > On 12/20/18 4:41 PM, Jeff Law wrote: > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote: > > > I am just saying that you need at least have cost for each insn > > > alternative (may be sub-targets). Although some approximation can be > > > possible (like insns number generated from the alternative or even their > > > size). > > For RISC targets, most instructions have exactly the same cost (and all > have the same size, or just a few sizes if you look at thumb etc.) > > > A further away (in pass distance) but maybe related project is to > > replace the current "instruction selection" (I'm talking about RTL > > expansion) > > In current GCC the instruction selection is expand+combine really, and > more the latter even, for well-written backends anyway. Most "smarts" > expand does does only get in the way, even. > > > with a scheme that works on (GIMPLE) SSA. My > > rough idea for prototyping pieces would be to first do this > > completely on GIMPLE by replacing a "instruction" by > > a GIMPLE asm with an "RTL" body (well, that doesn't have to > > be explicit, it just needs to remember the insn chosen). The > > available patterns are readily available in the .md files, we > > just need some GIMPLE <-> RTL translation of the operations. > > > > In the end this would do away with our named patterns > > for expansion purposes. > > That sounds nice :-) > > Do you see some way we can transition to such a scheme bit by bit, or > will there be a flag day? Well, we could do a "pre-expand" GIMPLE instruction selection phase doing instruction selection on (parts) of the IL either substituting internal-function calls and use direct-optabs for later RTL expansion (that would then introduce target-specific internal functions) or try using the suggested scheme^Whack of using a GIMPLE ASM kind with instead of the asm text something that RTL expansion can work with. The ASM approach has the advantage that we could put in constraints to guide RTL expansion, avoiding more "magic" (aka recog) there. Not sure what the hard part here is, but I guess it might be mapping of GIMPLE SSA to .md file define-insn patterns. Or maybe not. As said, it should be reasonable easy to handle it for the standard named patterns which is where you could prototype the plumbing w/o doing the .md file parsing and matcher auto-generation. Richard. > > Segher
On Fri, Dec 21, 2018 at 6:35 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote: > > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote: > > > > On 12/20/2018 06:14 PM, Peter Bergner wrote: > > > > > On 12/20/18 4:41 PM, Jeff Law wrote: > > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote: > > > > I am just saying that you need at least have cost for each insn > > > > alternative (may be sub-targets). Although some approximation can be > > > > possible (like insns number generated from the alternative or even their > > > > size). > > > > For RISC targets, most instructions have exactly the same cost (and all > > have the same size, or just a few sizes if you look at thumb etc.) > > > > > A further away (in pass distance) but maybe related project is to > > > replace the current "instruction selection" (I'm talking about RTL > > > expansion) > > > > In current GCC the instruction selection is expand+combine really, and > > more the latter even, for well-written backends anyway. Most "smarts" > > expand does does only get in the way, even. > > > > > with a scheme that works on (GIMPLE) SSA. My > > > rough idea for prototyping pieces would be to first do this > > > completely on GIMPLE by replacing a "instruction" by > > > a GIMPLE asm with an "RTL" body (well, that doesn't have to > > > be explicit, it just needs to remember the insn chosen). The > > > available patterns are readily available in the .md files, we > > > just need some GIMPLE <-> RTL translation of the operations. > > > > > > In the end this would do away with our named patterns > > > for expansion purposes. > > > > That sounds nice :-) > > > > Do you see some way we can transition to such a scheme bit by bit, or > > will there be a flag day? > > Well, we could do a "pre-expand" GIMPLE instruction selection > phase doing instruction selection on (parts) of the IL either > substituting internal-function calls and use direct-optabs for > later RTL expansion (that would then introduce target-specific > internal functions) or try using the suggested scheme^Whack > of using a GIMPLE ASM kind with instead of the asm text > something that RTL expansion can work with. The ASM approach > has the advantage that we could put in constraints to guide RTL > expansion, avoiding more "magic" (aka recog) there. > > Not sure what the hard part here is, but I guess it might be > mapping of GIMPLE SSA to .md file define-insn patterns. > > Or maybe not. As said, it should be reasonable easy to > handle it for the standard named patterns which is where > you could prototype the plumbing w/o doing the .md file > parsing and matcher auto-generation. To expand on this I was thinking about doing such partial transition to get rid of TER - all the cases TER now is required for would be "early instruction selected". Richard. > Richard. > > > > > Segher
On Fri, Dec 21, 2018 at 06:35:14PM +0100, Richard Biener wrote: > On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote: > > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov <vmakarov@redhat.com> wrote: > > > > On 12/20/2018 06:14 PM, Peter Bergner wrote: > > > > > On 12/20/18 4:41 PM, Jeff Law wrote: > > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote: > > > > I am just saying that you need at least have cost for each insn > > > > alternative (may be sub-targets). Although some approximation can be > > > > possible (like insns number generated from the alternative or even their > > > > size). > > > > For RISC targets, most instructions have exactly the same cost (and all > > have the same size, or just a few sizes if you look at thumb etc.) > > > > > A further away (in pass distance) but maybe related project is to > > > replace the current "instruction selection" (I'm talking about RTL > > > expansion) > > > > In current GCC the instruction selection is expand+combine really, and > > more the latter even, for well-written backends anyway. Most "smarts" > > expand does does only get in the way, even. > > > > > with a scheme that works on (GIMPLE) SSA. My > > > rough idea for prototyping pieces would be to first do this > > > completely on GIMPLE by replacing a "instruction" by > > > a GIMPLE asm with an "RTL" body (well, that doesn't have to > > > be explicit, it just needs to remember the insn chosen). The > > > available patterns are readily available in the .md files, we > > > just need some GIMPLE <-> RTL translation of the operations. > > > > > > In the end this would do away with our named patterns > > > for expansion purposes. > > > > That sounds nice :-) > > > > Do you see some way we can transition to such a scheme bit by bit, or > > will there be a flag day? > > Well, we could do a "pre-expand" GIMPLE instruction selection > phase doing instruction selection on (parts) of the IL either > substituting internal-function calls and use direct-optabs for > later RTL expansion (that would then introduce target-specific > internal functions) or try using the suggested scheme^Whack > of using a GIMPLE ASM kind with instead of the asm text > something that RTL expansion can work with. The ASM approach > has the advantage that we could put in constraints to guide RTL > expansion, avoiding more "magic" (aka recog) there. Hrm, so a special kind of GIMPLE ASM, let's call it "GIMPLE RTL"... That sounds good yes! As an intermediate, of course :-) > Not sure what the hard part here is, but I guess it might be > mapping of GIMPLE SSA to .md file define-insn patterns. Expand does so *much* currently. Maybe it shouldn't. But then we need to move much of what it does to a better place, because not all of it is useless. > Or maybe not. As said, it should be reasonable easy to > handle it for the standard named patterns which is where > you could prototype the plumbing w/o doing the .md file > parsing and matcher auto-generation. Segher
On 12/21/18 9:24 AM, Vladimir Makarov wrote: > Peter, also if you are interesting to do RA work, there is another problem > which is to implement sub-register level conflict calculations in LRA. > Currently, IRA has a simple subregister level conflict calculation (see > allocno objects) and in a case of sub-register presence IRA and LRA decisions > are different and this results in worse code generations (there are some PRs > for this). It would be also a big RA project to do. Can you point me to the PRs? Thanks. Peter
On 12/28/2018 11:13 AM, Peter Bergner wrote: > On 12/21/18 9:24 AM, Vladimir Makarov wrote: >> Peter, also if you are interesting to do RA work, there is another problem >> which is to implement sub-register level conflict calculations in LRA. >> Currently, IRA has a simple subregister level conflict calculation (see >> allocno objects) and in a case of sub-register presence IRA and LRA decisions >> are different and this results in worse code generations (there are some PRs >> for this). It would be also a big RA project to do. > Can you point me to the PRs? Thanks. > Peter, sorry for the answer delay. I am still on vacation. Here is a recent PR I remember: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84757
Index: combine.c =================================================================== --- combine.c (revision 267029) +++ combine.c (working copy) @@ -2349,7 +2349,12 @@ cant_combine_insn_p (rtx_insn *insn) dest = SUBREG_REG (dest); if (REG_P (src) && REG_P (dest) && ((HARD_REGISTER_P (src) - && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src))) + && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (src)) +#ifdef LEAF_REGISTERS + && ! LEAF_REGISTERS [REGNO (src)]) +#else + ) +#endif || (HARD_REGISTER_P (dest) && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (dest)) && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dest))))))