Message ID | 8cc18935-a9aa-513f-eef2-154130b70a3b@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Disable -fcaller-saves by default | expand |
On Thu, 2020-07-23 at 14:25 -0500, Pat Haugen via Gcc-patches wrote: > Disable -fcaller-saves by default. > > This patch turns off -fcaller-saves by default so that IRA doesn't try > to use volatile regs for pseudos that are live across a call, which > would then require LRA to save/restore the reg around the call. > > Bootstrap/regtest on powerpc64le with no new regressions. Also ran a > CPU2017 benchmark comparison with no major differences (a few minor > improvements and one minor degradation). Ok for trunk? > > -Pat > > > 2020-07-23 Pat Haugen <pthaugen@linux.ibm.com> > > gcc/ > * common/config/rs6000/rs6000-common.c > (rs6000_option_optimization_table): Turn off -fcaller-saves. > > gcc/testsuite/ > * gcc.target/powerpc/caller-saves.c: New. What's driving this change? IRA will do a cost/benefit analysis to see using call clobbered registers like this is profitable or not. You're just turning the whole thing off. This can be particularly bad for performance if you have classes with no call saved registers. Jeff
Hi! On Thu, Jul 23, 2020 at 01:42:59PM -0600, Jeff Law wrote: > On Thu, 2020-07-23 at 14:25 -0500, Pat Haugen via Gcc-patches wrote: > > Disable -fcaller-saves by default. > > > > This patch turns off -fcaller-saves by default so that IRA doesn't try > > to use volatile regs for pseudos that are live across a call, which > > would then require LRA to save/restore the reg around the call. > > 2020-07-23 Pat Haugen <pthaugen@linux.ibm.com> > > > > gcc/ > > * common/config/rs6000/rs6000-common.c > > (rs6000_option_optimization_table): Turn off -fcaller-saves. > > > > gcc/testsuite/ > > * gcc.target/powerpc/caller-saves.c: New. > What's driving this change? Peter noticed IRA allocates stuff to volatile registers while it is life through a call, and then LRA has to correct that, not optimal. > IRA will do a cost/benefit analysis to see using call clobbered registers like > this is profitable or not. You're just turning the whole thing off. '-fcaller-saves' Enable allocation of values to registers that are clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code. This option is always enabled by default on certain machines, usually those which have no call-preserved registers to use instead. So, given what Peter saw, the analysis when to do or not do this isn't as good as could be hoped for. > This can be particularly bad for performance if you have classes with no call > saved registers. Do we though? Well, "d" for vectors, but there shouldn't be insns that require that? Segher
On Thu, 2020-07-23 at 15:19 -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Jul 23, 2020 at 01:42:59PM -0600, Jeff Law wrote: > > On Thu, 2020-07-23 at 14:25 -0500, Pat Haugen via Gcc-patches wrote: > > > Disable -fcaller-saves by default. > > > > > > This patch turns off -fcaller-saves by default so that IRA doesn't try > > > to use volatile regs for pseudos that are live across a call, which > > > would then require LRA to save/restore the reg around the call. > > > 2020-07-23 Pat Haugen <pthaugen@linux.ibm.com> > > > > > > gcc/ > > > * common/config/rs6000/rs6000-common.c > > > (rs6000_option_optimization_table): Turn off -fcaller-saves. > > > > > > gcc/testsuite/ > > > * gcc.target/powerpc/caller-saves.c: New. > > What's driving this change? > > Peter noticed IRA allocates stuff to volatile registers while it is life > through a call, and then LRA has to correct that, not optimal. I can't recall if IRA or LRA handles the need to save them to the stack, but the whole point is you can do much better sometimes by saving into the stack with the caller-saves algorithm vs just giving up and spilling. > > > IRA will do a cost/benefit analysis to see using call clobbered registers like > > this is profitable or not. You're just turning the whole thing off. > > '-fcaller-saves' > Enable allocation of values to registers that are clobbered by > function calls, by emitting extra instructions to save and restore > the registers around such calls. Such allocation is done only when > it seems to result in better code. > > This option is always enabled by default on certain machines, > usually those which have no call-preserved registers to use > instead. > > So, given what Peter saw, the analysis when to do or not do this isn't > as good as could be hoped for. Sure. That obviously happens. When it does the usual response is to dig deeper into why, not turn the entire option off. Jeff
On Thu, Jul 23, 2020 at 02:29:00PM -0600, Jeff Law wrote: > On Thu, 2020-07-23 at 15:19 -0500, Segher Boessenkool wrote: > > On Thu, Jul 23, 2020 at 01:42:59PM -0600, Jeff Law wrote: > > > On Thu, 2020-07-23 at 14:25 -0500, Pat Haugen via Gcc-patches wrote: > > > > Disable -fcaller-saves by default. > > > > > > > > This patch turns off -fcaller-saves by default so that IRA doesn't try > > > > to use volatile regs for pseudos that are live across a call, which > > > > would then require LRA to save/restore the reg around the call. > > > > 2020-07-23 Pat Haugen <pthaugen@linux.ibm.com> > > > > > > > > gcc/ > > > > * common/config/rs6000/rs6000-common.c > > > > (rs6000_option_optimization_table): Turn off -fcaller-saves. > > > > > > > > gcc/testsuite/ > > > > * gcc.target/powerpc/caller-saves.c: New. > > > What's driving this change? > > > > Peter noticed IRA allocates stuff to volatile registers while it is life > > through a call, and then LRA has to correct that, not optimal. > I can't recall if IRA or LRA handles the need to save them to the stack, but the > whole point is you can do much better sometimes by saving into the stack with the > caller-saves algorithm vs just giving up and spilling. But the only thing this flag influences is what registers IRA allocates, nothing else, or what am I missing? > > > IRA will do a cost/benefit analysis to see using call clobbered registers like > > > this is profitable or not. You're just turning the whole thing off. > > > > '-fcaller-saves' > > Enable allocation of values to registers that are clobbered by > > function calls, by emitting extra instructions to save and restore > > the registers around such calls. Such allocation is done only when > > it seems to result in better code. > > > > This option is always enabled by default on certain machines, > > usually those which have no call-preserved registers to use > > instead. The documentation does not match reality btw, and even contradicts it for many targets. > > So, given what Peter saw, the analysis when to do or not do this isn't > > as good as could be hoped for. > Sure. That obviously happens. When it does the usual response is to dig deeper > into why, not turn the entire option off. That is after digging deeper! Maybe not deep enough, hrm. Peter? Segher
On 7/23/20 3:29 PM, Jeff Law wrote: >>> What's driving this change? >> >> Peter noticed IRA allocates stuff to volatile registers while it is life >> through a call, and then LRA has to correct that, not optimal. > I can't recall if IRA or LRA handles the need to save them to the stack, but the > whole point is you can do much better sometimes by saving into the stack with the > caller-saves algorithm vs just giving up and spilling. > >> >>> IRA will do a cost/benefit analysis to see using call clobbered registers like >>> this is profitable or not. You're just turning the whole thing off. Sorry for taking so long to reply. I'm the guilty party for asking Pat to submit the patch. :-) I was not aware IRA did that and just assumed it was a bug. For now, consider the patch withdrawn. That said, I will have to look at that cost computation, especially since when I last looked, IRA does not count potential prologue/epilogue save/restore insns if it were to assign a non-volatile register when computing reg costs. That would seem to be an important consideration here. I'll note this all came about when I was looking at PR96017, which is due to not shrink-wrapping a pseudo. That was due to it being live across a call. I first I thought (for the 2nd test case, not the original one) split_live_ranges_for_shrink_wrap() was nicely splitting the pseudo for us, but it must have been the caller-saves algorithm you mention above. However, that code doesn't handle the original test case, which I think it should. My thought for that bug was to introduce some extra splitting before RA (maybe as part of split_live_ranges_for_shrink_wrap?) where we split pseudos that are live across a call, but that have at least one path to the exit that doesn't cross a call. However, if we can beef up the caller-saves cost computation, maybe that would work too? Peter
On Mon, 2020-08-10 at 11:11 -0500, Peter Bergner wrote: > On 7/23/20 3:29 PM, Jeff Law wrote: > > > > What's driving this change? > > > > > > Peter noticed IRA allocates stuff to volatile registers while it is life > > > through a call, and then LRA has to correct that, not optimal. > > I can't recall if IRA or LRA handles the need to save them to the stack, but the > > whole point is you can do much better sometimes by saving into the stack with the > > caller-saves algorithm vs just giving up and spilling. > > > > > > IRA will do a cost/benefit analysis to see using call clobbered registers like > > > > this is profitable or not. You're just turning the whole thing off. > > Sorry for taking so long to reply. I'm the guilty party for asking Pat > to submit the patch. :-) I was not aware IRA did that and just assumed > it was a bug. For now, consider the patch withdrawn. That said, I > will have to look at that cost computation, especially since when I > last looked, IRA does not count potential prologue/epilogue save/restore > insns if it were to assign a non-volatile register when computing reg > costs. That would seem to be an important consideration here. No worries. Yea, you want to count the prologue/epilogue, as well as the saves/restores at the call points (which need frequency scaling and accounting for saves which don't need to happen because a prior save is sufficient to cover more than one call), etc. I think much of this code is still in caller-save.c. It's been eons since I worked on it, but I can probably get reacquainted with the implementation if you have questions > I'll note this all came about when I was looking at PR96017, which is > due to not shrink-wrapping a pseudo. That was due to it being live > across a call. I first I thought (for the 2nd test case, not the original > one) split_live_ranges_for_shrink_wrap() was nicely splitting the pseudo > for us, but it must have been the caller-saves algorithm you mention above. > However, that code doesn't handle the original test case, which I think > it should. > > My thought for that bug was to introduce some extra splitting before RA > (maybe as part of split_live_ranges_for_shrink_wrap?) where we split > pseudos that are live across a call, but that have at least one path > to the exit that doesn't cross a call. However, if we can beef up > the caller-saves cost computation, maybe that would work too? I've gone back and forth on pre allocation splitting as well as post-allocating splitting and re-allocation. I could argue either side of that discussion -- given we've got a bit of special code for splitting to help shrink wrapping, maybe that's the best place if we need to do splitting before RA since this was triggered by digging into a shrink-wrapping problem. I'd probably start by making sure the cost computation is sane though. Jeff
On Tue, Aug 25, 2020 at 02:35:51PM -0600, Jeff Law wrote: > I've gone back and forth on pre allocation splitting as well as post-allocating > splitting and re-allocation. I could argue either side of that discussion -- If you end up wanting something split it is best to do it early. But if you cannot predict well if you want it split eventually, there is no good way to do this afaics :-( > given we've got a bit of special code for splitting to help shrink wrapping, Only very minimal stuff for splitting copies from a hard reg (argument reg) to a pseudo. And then only for all together. > I'd probably start by making sure the cost computation is sane though. Yeah. Segher
On Wed, 2020-08-26 at 15:58 -0500, Segher Boessenkool wrote: > On Tue, Aug 25, 2020 at 02:35:51PM -0600, Jeff Law wrote: > > I've gone back and forth on pre allocation splitting as well as post-allocating > > splitting and re-allocation. I could argue either side of that discussion -- > > If you end up wanting something split it is best to do it early. The problem with early splitting is you don't know if you want to split or not. You're making an educated guess how the split will interact with register allocation. If you guess wrong, often the allocator will save you and assign the split pseudos to the same hard reg, but not always. Been there more than once. > But if > you cannot predict well if you want it split eventually, there is no > good way to do this afaics :-( Precisely. In the old LRS bits (Meissner with a little help from me, circa 1995) we actually recorded the original pseudo as well as all its split versions in a new RTL node. We could then undo the splitting if it wasn't helpful. IIRC we also use that node to improve debugging of objects that were subject to LRS. It worked reasonably well, but was fairly hard to maintain. I think we managed to drop in some of the infrastructure into EGCS and shipped to to our customer, but I don't think we ever got the main pass submitted. Eventually it got dropped as it was fairly painful to maintain. jeff
diff --git a/gcc/common/config/rs6000/rs6000-common.c b/gcc/common/config/rs6000/rs6000-common.c index ee37b9dc90b..6d68834aed2 100644 --- a/gcc/common/config/rs6000/rs6000-common.c +++ b/gcc/common/config/rs6000/rs6000-common.c @@ -43,8 +43,13 @@ static const struct default_options rs6000_option_optimization_table[] = on rs6000, turn it off by default. */ { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, + /* Don't allow IRA to use volatile regs across function calls. */ + { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, + /* Double growth factor to counter reduced min jump length. */ { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 }, + + /* Following marks the end of the list and needs to remain last. */ { OPT_LEVELS_NONE, 0, NULL, 0 } }; diff --git a/gcc/testsuite/gcc.target/powerpc/caller-saves.c b/gcc/testsuite/gcc.target/powerpc/caller-saves.c new file mode 100644 index 00000000000..846356f16f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/caller-saves.c @@ -0,0 +1,9 @@ +/* { dg-do compile */ +/* { dg-options "-O2 -fverbose-asm" } */ + +void +foo () +{ +} + +/* { dg-final { scan-assembler-not "fcaller-saves" } } */