Message ID | 87d1v7v1g9.fsf@redhat.com |
---|---|
State | New |
Headers | show |
> (This is on the RL78 target where HImode values occupy two hard > registers and QImode values only one. The bug however is generic, not > RL78 specific). > > The REE pass transforms this into: > > (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) > (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) > [...] > (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) > [...] > (insn 88 deleted) > > Note how the new set at insn 45 clobbers the value loaded by insn 44 > into r11. I had a look around. There's code testing HARD_REGNO_NREGS in ree.c:combine_set_extension. It's inside #if 0, and labelled "temporarily disabled". See if enabling that helps you? (Jeff, that #if 0 was added by you). Bernd
On 11/18/2015 05:52 PM, Bernd Schmidt wrote: >> (This is on the RL78 target where HImode values occupy two hard >> registers and QImode values only one. The bug however is generic, not >> RL78 specific). >> >> The REE pass transforms this into: >> >> (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) >> (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) >> [...] >> (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) >> [...] >> (insn 88 deleted) >> >> Note how the new set at insn 45 clobbers the value loaded by insn 44 >> into r11. > > I had a look around. There's code testing HARD_REGNO_NREGS in > ree.c:combine_set_extension. It's inside #if 0, and labelled > "temporarily disabled". See if enabling that helps you? (Jeff, that #if > 0 was added by you). I was going to dig into this discussion tomorrow since I was the last one in that code. Just had to get that threading regression resolved first :-) jeff
Hi Bernd, > I had a look around. There's code testing HARD_REGNO_NREGS in > ree.c:combine_set_extension. It's inside #if 0, and labelled > "temporarily disabled". See if enabling that helps you? (Jeff, that #if > 0 was added by you). I suspect that the code was disabled because it prevented too many useful optimization opportunities. The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Cheers Nick
On 11/19/2015 04:34 PM, Nick Clifton wrote: > Hi Bernd, > >> I had a look around. There's code testing HARD_REGNO_NREGS in >> ree.c:combine_set_extension. It's inside #if 0, and labelled >> "temporarily disabled". See if enabling that helps you? (Jeff, that #if >> 0 was added by you). > > I suspect that the code was disabled because it prevented too many > useful optimization opportunities. Well, correctness goes first. Since reenabling it fixes your problem, that change is approved (conditional on testing as usual obviously). Bernd
On 11/19/2015 09:53 AM, Bernd Schmidt wrote: > On 11/19/2015 04:34 PM, Nick Clifton wrote: >> Hi Bernd, >> >>> I had a look around. There's code testing HARD_REGNO_NREGS in >>> ree.c:combine_set_extension. It's inside #if 0, and labelled >>> "temporarily disabled". See if enabling that helps you? (Jeff, that #if >>> 0 was added by you). >> >> I suspect that the code was disabled because it prevented too many >> useful optimization opportunities. > > Well, correctness goes first. Since reenabling it fixes your problem, > that change is approved (conditional on testing as usual obviously). I couldn't convince myself it was a correct check. Having Nick's testcase should help with the analysis. Frankly, the overall structure of ree is a mess -- I've found it incredibly difficult to follow every time I've had to look at it. Let me take a peek today and see if I can get familiar enough with this code again to help. Jeff
On 11/19/2015 08:34 AM, Nick Clifton wrote: > Hi Bernd, > >> I had a look around. There's code testing HARD_REGNO_NREGS in >> ree.c:combine_set_extension. It's inside #if 0, and labelled >> "temporarily disabled". See if enabling that helps you? (Jeff, that #if >> 0 was added by you). > > I suspect that the code was disabled because it prevented too many > useful optimization opportunities. > > The code there would solve this problem, but the approach is is overly > cautious, since it disables the optimization for all extensions that > increase the number of hard registers used. Some of these will be > viable candidates, provided that the extra hard registers are no used. > (This is certainly true for the RL78, where the (patched) optimization > does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Jeff
On 11/18/2015 07:15 AM, Nick Clifton wrote: > Hi Guys, > > I recently discovered a bug in the current Redundant Extension > Elimination optimization. If the candidate extension instruction > increases the number of hard registers used, the pass does not check > to see if these extra registers are live between the definition and > the extension. > > For example: > > (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) > (insn 45 (set (reg:QI r10) (mem:QI (reg:HI r18))) > [...] > (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) > [...] > (insn 88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10))) > > (This is on the RL78 target where HImode values occupy two hard > registers and QImode values only one. The bug however is generic, not > RL78 specific). Right. I can recall walking though multi-reg cases on my whiteboard. There was one particular case Jakub was concerned about in this space, but I couldn't contrive a testcase to trigger and I was getting to the point where I was struggling to be able to reason about the behaviour of the code. > > The REE pass transforms this into: > > (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) > (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) > [...] > (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) > [...] > (insn 88 deleted) > > Note how the new set at insn 45 clobbers the value loaded by insn 44 > into r11. Right. > > The patch below is my attempt to fix this. It is not correct > however. I could not work out how to determine if a given hard > register is live at any point between two insns. So instead I used > the liveness information associated with the basic block containing > the definition. If one of the extended registers is live or becomes > live in this block, and this block is not the same block as the one > containing the extension insn, then I stop the optimization. This > works for the RL78 for all of the cases that I could find. The way this pass has dealt with this class of problems is the various routines in rtlanal.c reg_used_between_p, reg_set_between_p and the like. > > So ... comments please ? > > Will gcc ever generate a single basic block that contains both the > definition and the extension instructions, but also where the extra > hard registers are used for another purpose as well ? I can't see any reason why it would not. jeff
Hi Jeff, >> The code there would solve this problem, but the approach is is overly >> cautious, since it disables the optimization for all extensions that >> increase the number of hard registers used. Some of these will be >> viable candidates, provided that the extra hard registers are no used. >> (This is certainly true for the RL78, where the (patched) optimization >> does improve code, even though the widening does use extra registers). > Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Cheers Nick
On 11/20/2015 02:19 AM, Nick Clifton wrote: > Hi Jeff, > >>> The code there would solve this problem, but the approach is is overly >>> cautious, since it disables the optimization for all extensions that >>> increase the number of hard registers used. Some of these will be >>> viable candidates, provided that the extra hard registers are no used. >>> (This is certainly true for the RL78, where the (patched) optimization >>> does improve code, even though the widening does use extra registers). >> Nick -- can you pass along your testcode? > > Sure - this is for the RL78 toolchain. In theory the problem is > generic, but I have not tested other toolchains. Right. It just really helps me to have something I can poke at -- it's just the type of learner I am. I tried to trip the test in that #if several times last year and never came up with anything. So naturally if we can trip it with the rl78 I want to dig into it. > > Compile the gcc.c-torture/execute/pr42833.c or > gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free > also specified on the command line. Without -free these tests pass. > With -free they fail. Perfect. Building rl78-elf cross bits as I type... jeff
On 11/20/2015 02:19 AM, Nick Clifton wrote: > Hi Jeff, > >>> The code there would solve this problem, but the approach is is overly >>> cautious, since it disables the optimization for all extensions that >>> increase the number of hard registers used. Some of these will be >>> viable candidates, provided that the extra hard registers are no used. >>> (This is certainly true for the RL78, where the (patched) optimization >>> does improve code, even though the widening does use extra registers). >> Nick -- can you pass along your testcode? > > Sure - this is for the RL78 toolchain. In theory the problem is > generic, but I have not tested other toolchains. > > Compile the gcc.c-torture/execute/pr42833.c or > gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free > also specified on the command line. Without -free these tests pass. > With -free they fail. So this looks like a pretty fundamental failing in ree.c and AFAICT has been there since day 1. Thankfully, I don't think it's likely to trigger all that often. It's useful to ponder two cases. One where we're going to add a copy and one where we don't. I added the copy case during the 4.9 cycle as an extension of existing code. It only fires in easy to analyze circumstances -- when the original set and the extension are in the same basic block, but have different destination registers. In the copy case. The code checks that the destination of the *extension* is neither used nor set between the original set and the extension. So in the case of multi-word hard reg, we'll verify that the entire multi-word hard reg survives from the original insn to the extension. That avoids this problem in cases where a copy is needed. However, in the case where no copy is needed, no such checks are made. In fact the checks could be fairly painful as the original set and the extension can be in different blocks with arbitrary flow between them. I would hazard a guess that the authors simply didn't consider the multi-hard reg case. Essentially if the original set reached an extension, then obviously the original set got there unharmed and the extended destination should reach as well -- except that doesn't apply to multi-word hard regs. Given the expense in checking all the potential paths between the original set and the extension, I think just disallowing this case ought to be is the best way to go. Ideally we'll filter out that case before we get to combine_set_extension. But if we can't, it's easy enough to test there. Jeff
> I would hazard a guess that the authors simply didn't consider the > multi-hard reg case. Essentially if the original set reached an > extension, then obviously the original set got there unharmed and the > extended destination should reach as well -- except that doesn't apply > to multi-word hard regs. This pass started as a Zero-Extension Elimination pass for x86-64 and was only considering implicit SI->DI extensions initially. The algorithm was tailored to this specific pattern and I agree that the pass should be reimplemented.
On 11/18/2015 07:15 AM, Nick Clifton wrote: > Hi Guys, > > I recently discovered a bug in the current Redundant Extension > Elimination optimization. If the candidate extension instruction > increases the number of hard registers used, the pass does not check > to see if these extra registers are live between the definition and > the extension. > > For example: > > (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) > (insn 45 (set (reg:QI r10) (mem:QI (reg:HI r18))) > [...] > (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) > [...] > (insn 88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10))) > > (This is on the RL78 target where HImode values occupy two hard > registers and QImode values only one. The bug however is generic, not > RL78 specific). > > The REE pass transforms this into: > > (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) > (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18)))) > [...] > (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) > [...] > (insn 88 deleted) > > Note how the new set at insn 45 clobbers the value loaded by insn 44 > into r11. > > The patch below is my attempt to fix this. It is not correct > however. I could not work out how to determine if a given hard > register is live at any point between two insns. So instead I used > the liveness information associated with the basic block containing > the definition. If one of the extended registers is live or becomes > live in this block, and this block is not the same block as the one > containing the extension insn, then I stop the optimization. This > works for the RL78 for all of the cases that I could find. > > So ... comments please ? > > Will gcc ever generate a single basic block that contains both the > definition and the extension instructions, but also where the extra > hard registers are used for another purpose as well ? > > Tested with no regressions (or fixes) on an x86-pc-linux-gnu target. > Also tested with no regression and 7 fixes on an rl78-elf target. > > Cheers > Nick > > gcc/ChangeLog > 2015-11-18 Nick Clifton <nickc@redhat.com> > > * ree.c (regs_live_between): New function. > (add_removable_extension): If the extension uses more hard > registers than the definition then check that the extra hard > registers are not live between the definition and the > extension. > > @@ -1080,6 +1123,30 @@ > } > } > > + /* Fourth, if the extended version occupies more registers than > + the original version then make sure that these extra registers > + are not live between the definition and the extension. */ > + if (HARD_REGNO_NREGS (REGNO (dest), mode) > + > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg))) So this seems like a reasonable place to catch this. I certainly prefer to avoid work later by filtering earlier like you've done. The problem is with your implementation of regs_live_between. You're assuming we've got straight-line code between the original set and the extension. That's not guaranteed. Essentially we've got use-def chains via the DF framework. So, in theory, they can be in different blocks and arbitrary flow control between them. In fact, the extension could be before the original set in the insn chain (though obviously not for execution order). I think it's reasonable to just punt when we see that the source/dest register of the extension are the same and the number of hard regs is different. That's what I'm testing now. jeff
Index: gcc/ree.c =================================================================== --- gcc/ree.c (revision 230517) +++ gcc/ree.c (working copy) @@ -952,6 +952,49 @@ return false; } +/* Returns TRUE if any of the registers [start, stop) are live between DEF and up + to, but not including, INSN. */ + +static bool +regs_live_between (rtx_insn * insn, + struct df_link * def, + unsigned int start, + unsigned int stop) +{ + basic_block bb; + + /* FIXME: This is an incomplete test. It only checks the DEF and USE states of the + registers in basic block of DEF. If any of them match the start-stop range and + the INSN is not in the same block, then it is assumed that the registers must be + live. */ + /* These first few checks are just paranoia... */ + if (def != NULL + && def->ref != NULL + && def->ref->base.insn_info != NULL + && def->ref->base.insn_info->insn != NULL + && (bb = BLOCK_FOR_INSN (def->ref->base.insn_info->insn)) != NULL + && bb != BLOCK_FOR_INSN (insn)) + { + struct df_lr_bb_info * bb_info = df_lr_get_bb_info (bb->index); + bitmap_iterator bi; + unsigned int reg; + + EXECUTE_IF_SET_IN_BITMAP (& bb_info->def, 0, reg, bi) + { + if (reg >= start && reg < stop) + return true; + } + + EXECUTE_IF_SET_IN_BITMAP (& bb_info->out, 0, reg, bi) + { + if (reg >= start && reg < stop) + return true; + } + } + + return false; +} + /* Add an extension pattern that could be eliminated. */ static void @@ -1080,6 +1123,30 @@ } } + /* Fourth, if the extended version occupies more registers than + the original version then make sure that these extra registers + are not live between the definition and the extension. */ + if (HARD_REGNO_NREGS (REGNO (dest), mode) + > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg))) + { + unsigned int start_reg = REGNO (reg) + HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)); + unsigned int stop_reg = REGNO (reg) + HARD_REGNO_NREGS (REGNO (dest), mode); + + for (def = defs; def; def = def->next) + { + if (regs_live_between (insn, def, start_reg, stop_reg)) + { + if (dump_file) + { + fprintf (dump_file, "Cannot eliminate extension:\n"); + print_rtl_single (dump_file, insn); + fprintf (dump_file, " because extended register(s) are already used\n"); + } + return; + } + } + } + /* Then add the candidate to the list and insert the reaching definitions into the definition map. */ ext_cand e = {expr, code, mode, insn};