Message ID | Pine.LNX.4.64.1111071606580.24579@digraph.polyomino.org.uk |
---|---|
State | New |
Headers | show |
On Mon, Nov 07, 2011 at 04:10:38PM +0000, Joseph S. Myers wrote: > FWIW, when I encountered such a problem in 4.4-based tools I found I > needed the following change to stack_tie patterns to fix it (applied to > and rediffed against trunk, but not tested there), in addition to a > DEFAULT_ABI == ABI_V4 check such as you added. Given all the confusion > about this class of bugs and whether they should be addressed in each back > end or in a target-independent way, I didn't try to send this upstream at > the time. > > 2011-04-11 Joseph Myers <joseph@codesourcery.com> > > * config/rs6000/rs6000.md (stack_tie): Use (mem:BLK (scratch)) as > output of SET. > (restore_stack_block, restore_stack_nonlocal): Update UNSPEC_TIE > patterns. OK, so you made the stack ties conflict with all memory. I wondered about that too, but didn't find a testcase where it mattered. It would certainly be safer to use the big hammer, but on the other hand it can restrict scheduling unnecessarily. For example void byref (int *a) { int x; asm ("#%0 set here" : "=r" (x) : : "r29", "r30"); *a = x; } with the existing stack ties, the write to *a happens after stack deallocation.
On Tue, 8 Nov 2011, Alan Modra wrote: > OK, so you made the stack ties conflict with all memory. I wondered > about that too, but didn't find a testcase where it mattered. It The test I had was one of those from the various PRs for this issue. int find_num(int i) { const int arr[5] = {0, 1, 2, 3, 4}; return arr[i]; } (-O2 -mcpu=603). But I don't know if it reproduces with FSF 4.4 or trunk; there were plenty of other local changes in the 4.4-based tree where I found this to be necessary.
On Tue, Nov 08, 2011 at 12:30:36AM +0000, Joseph S. Myers wrote: > On Tue, 8 Nov 2011, Alan Modra wrote: > > > OK, so you made the stack ties conflict with all memory. I wondered > > about that too, but didn't find a testcase where it mattered. It > > The test I had was one of those from the various PRs for this issue. > > int find_num(int i) > { > const int arr[5] = {0, 1, 2, 3, 4}; > return arr[i]; > } > > (-O2 -mcpu=603). But I don't know if it reproduces with FSF 4.4 or trunk; > there were plenty of other local changes in the 4.4-based tree where I > found this to be necessary. The testcase generates good code on current gcc-4.7, gcc-4.6, and gcc-4.5, but fails on current gcc-4.4. I thought I had this PR fixed on all active branches. :-( It's been a while since I looked at what was happening with this testcase, but from memory what stops sheduling over the stack_tie is alias.c:base_alias_check. Which relies on tracking registers to see whether two memory accesses using different regs could alias. Quite possibly gcc-4.4 is deficient in this area.
On Nov 8, 2011, at 6:13 AM, Alan Modra wrote: > It's been a while since I looked at what was happening with this > testcase, but from memory what stops sheduling over the stack_tie is > alias.c:base_alias_check. Which relies on tracking registers to see > whether two memory accesses using different regs could alias. Quite > possibly gcc-4.4 is deficient in this area. I re-investigated our testcase yesterday and this is indeed what makes the difference. I couldn't reproduce the misbehavior on mainline, not because of the recent change you mentioned upthread, but because of an earlier one, rev 180522. The problem is visible on 180521, and here is what is happening (roughly): The testcase (sources below), compiled with -O1 -fschedule-insns2 by a powerpc-wrs-vxworks compiler produces in .pro_and_epilogue: (insn 29 22 28 2 (set (reg/f:SI 11 11 [133]) (high:SI (symbol_ref:SI ("mysym11") ...))) ./t.c:19 408 {elf_high} (expr_list:REG_EQUIV (high:SI (symbol_ref:SI ("mysym11") ...) (insn 28 29 24 2 (set (reg/f:SI 0 0 [orig:123 p11$3 ] [123]) (lo_sum:SI (reg/f:SI 11 11 [133]) (symbol_ref:SI ("mysym11") ) (expr_list:REG_EQUIV (symbol_ref:SI ("mysym11") ...) ... (note 39 24 40 2 NOTE_INSN_EPILOGUE_BEG) (insn 40 39 41 2 (set (reg:SI 11 11) (plus:SI (reg/f:SI 31 31) (const_int 48 [0x30]))) ./t.c:22 -1 ... (insn 43 42 44 2 (set (reg:SI 25 25) (mem/c:SI (plus:SI (reg:SI 11 11) (const_int -28 [0xffffffffffffffe4])) [0 S4 A8])) ./t.c:22 -1 (insn/f 44 43 45 2 (set (reg/f:SI 31 31) (mem/c:SI (plus:SI (reg:SI 11 11) (const_int -4 [0xfffffffffffffffc])) [0 S4 A8])) ./t.c:22 -1 (insn 45 44 46 2 (set (mem/c:BLK (reg/f:SI 1 1) [0 A8]) (unspec:BLK [ (mem/c:BLK (reg/f:SI 1 1) [0 A8]) ] UNSPEC_TIE)) ./t.c:22 -1 (insn/f 46 45 47 2 (set (reg/f:SI 1 1) (reg:SI 11 11)) ./t.c:22 -1 insn 45 is the "stack tie" intended to prevent a move of the sp restore (insn 46) prior to the register restores accessing the frame (insn 43 here) Most of the time, as you say, this works because the mem accesses are considered conflicting. In this very particular case, we get into base_alias_check with x = (reg/f:SI 1 1) y = (plus:SI (reg:SI 11 11) (const_int -4 [0xfffffffffffffffc])) from which base_alias_check infers x_base = (address:SI (reg/f:SI 1 1)) y_base = (symbol_ref:SI ("mysym11") ...) (Erm, _that_ doesn't sound right: r11 certainly is not a base access to mysym11 past insn 40) x_base and y_base are both != 0, and we get (still within base_alias_check) into /* If one address is a stack reference there can be no alias: stack references using different base registers do not alias, a stack reference can not alias a parameter, and a stack reference can not alias a global. */ Eventually, the tie and the sp restore move up together prior to the register restore. rev 180522 changes the registers allocation so that r11 is not used in the function body any more. That r11 is considered a base to mysym11 certainly looks incorrect. Now, looking at the comment quoted above suggests that the current stack tie mechanism used in emit_epilogue (relying on mem r11 to always conflict with mem:blk sp) is optimistic, regardless, which Joesph's experience seem to confirm. Joseph resorted to mem:scratch to impose a strong barrier. That's certainly safe and I don't think the performance impact can be significant, so this looks like a good way out. I tried an approach in between here: replace the stack_tie insn by a frame_tie insn with frame_base_rtx as the second base. In the case above, with frame_base_rtx==r11, this is something like (set (mem/c:BLK (reg:SI 11 11) [0 A8]) (unspec:BLK [ (mem/c:BLK (reg:SI 11 11) [0 A8]) (mem/c:BLK (reg/f:SI 1 1) [0 A8]) As I mentioned upthread, I'm still unclear on a couple of aspects though. For example, ISTM that there are paths where we might need a tie and different frame base regs were used before. The conditions controlling the different paths are intricate so I'm not quite sure. Joseph's approach would for sure void the need to seek answers :) Olivier -- char mysym11; char * volatile g11; void foo (long x) { /* Force frame_pointer_needed & use of r11 as a frame_reg_rtx from emit_epilogue. */ char volatile s [x]; s[0] = 12; /* Force a register save/restore. */ asm volatile ("" : : : "25"); /* Try to force something like (set r11 (symref:mysym)) in the function body. */ { register char * volatile p11 asm("11"); p11 = &mysym11; g11 = p11; } }
On Tue, Nov 08, 2011 at 11:37:57AM +0100, Olivier Hainque wrote: > Joseph resorted to mem:scratch to impose a strong barrier. That's certainly > safe and I don't think the performance impact can be significant, so this > looks like a good way out. I agree. Our fancy powerpc stack deallocation barriers rely too much on alias analysis, which as you've shown (thanks!) isn't reliable in this instance. Other targets just use the mem:scratch barrier. > I tried an approach in between here: replace the stack_tie insn by a > frame_tie insn with frame_base_rtx as the second base. In the case above, > with frame_base_rtx==r11, this is something like > > (set (mem/c:BLK (reg:SI 11 11) [0 A8]) > (unspec:BLK [ > (mem/c:BLK (reg:SI 11 11) [0 A8]) > (mem/c:BLK (reg/f:SI 1 1) [0 A8]) Looks similar to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282#c15
On Nov 8, 2011, at 2:40 PM, Alan Modra wrote: > On Tue, Nov 08, 2011 at 11:37:57AM +0100, Olivier Hainque wrote: >> Joseph resorted to mem:scratch to impose a strong barrier. That's certainly >> safe and I don't think the performance impact can be significant, so this >> looks like a good way out. > > I agree. Our fancy powerpc stack deallocation barriers rely too much > on alias analysis, which as you've shown (thanks!) isn't reliable in > this instance. Other targets just use the mem:scratch barrier. Right. While the instance we have at hand here involves a bug I think (r11 perceived as a base reg to symbol), I don't see that we have strong guarantees that the current assumptions hold for sure in absence of such a bug. >> I tried an approach in between here: replace the stack_tie insn by a >> frame_tie insn with frame_base_rtx as the second base. In the case above, >> with frame_base_rtx==r11, this is something like >> >> (set (mem/c:BLK (reg:SI 11 11) [0 A8]) >> (unspec:BLK [ >> (mem/c:BLK (reg:SI 11 11) [0 A8]) >> (mem/c:BLK (reg/f:SI 1 1) [0 A8]) > > Looks similar to > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282#c15 Exactly. The idea here is to use the same mechanism in the other places where emit_epilogue emits a tie, making sure that the tie mem access will conflict with the frame accesses by issuing a write from the same base register. Now, if we can indeed have frame accesses using different base registers and a tie is needed after that (needs double checking), well, we'll need several of them. We could probably abstract this away and "just" keep track of a set of base registers used before a tie gets emitted. I'm not convinced that the potential gain is worth the extra complexity and potential risk of running into another subtle subcase, with hard to track sporadic runtime failures for starters. I don't have numbers though. That's a port maintainer call, I guess ? Olivier
On Nov 9, 2011, at 18:15 , Olivier Hainque wrote: > I'm not convinced that the potential gain is worth the extra > complexity and potential risk of running into another subtle > subcase, with hard to track sporadic runtime failures for > starters. I don't have numbers though. > > That's a port maintainer call, I guess ? David, opinion on this point ? My understanding is that we have two options 1) try to preserve the current attempt at maximizing optimization opportunities with precise stack/frame tie insns, 2) simplify for a slightly more brutal option, with a strong (mem:blk scratch) barrier instead My feeling is that 2 would be a sensible option. Trying to get the precise insns right has caused multiple issues (several PRs about hard to track silent wrong code generated, trickier implementation), and I'm not convinced that the legitimate code efficiency gains are worth the trouble. As I wrote, I don't have numbers to backup the latter point though. Thanks for your feedback, Olivier
On Wed, Nov 16, 2011 at 6:54 AM, Olivier Hainque <hainque@adacore.com> wrote: > > On Nov 9, 2011, at 18:15 , Olivier Hainque wrote: >> I'm not convinced that the potential gain is worth the extra >> complexity and potential risk of running into another subtle >> subcase, with hard to track sporadic runtime failures for >> starters. I don't have numbers though. >> >> That's a port maintainer call, I guess ? > > David, opinion on this point ? > > My understanding is that we have two options > > 1) try to preserve the current attempt at maximizing > optimization opportunities with precise stack/frame > tie insns, > > 2) simplify for a slightly more brutal option, with > a strong (mem:blk scratch) barrier instead > > My feeling is that 2 would be a sensible option. > > Trying to get the precise insns right has caused multiple > issues (several PRs about hard to track silent wrong code > generated, trickier implementation), and I'm not convinced > that the legitimate code efficiency gains are worth the > trouble. > > As I wrote, I don't have numbers to backup the latter point > though. We can try (2), but we will have to benchmark it to determine the impact on performance. Thanks, David
Coincidentally, IBM's XL compiler is encountering a similar issue and proposing a similar solution of having the stack pointer update conflict with all memory accesses. I think all these corner cases confirm that it is impractical for the compiler to track all accesses against SP and the only expedient solution is a hammer like conflicting with all memory. Thanks, David
On Nov 17, 2011, at 20:46 , David Edelsohn wrote: > Coincidentally, IBM's XL compiler is encountering a similar issue and > proposing a similar solution of having the stack pointer update > conflict with all memory accesses. Interesting timing :-) > I think all these corner cases > confirm that it is impractical for the compiler to track all accesses > against SP and the only expedient solution is a hammer like > conflicting with all memory. OK, I'll come up with a patch. I'll compare the generated assembly for the Ada runtime lib with and without the change. That should be easy to do and will provide a datapoint. Thanks for your feedback. Olivier
Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 181085) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -11989,7 +11989,7 @@ (define_expand "restore_stack_block" [(set (match_dup 2) (match_dup 3)) (set (match_dup 4) (match_dup 2)) - (set (match_dup 5) (unspec:BLK [(match_dup 5)] UNSPEC_TIE)) + (set (mem:BLK (scratch)) (unspec:BLK [(match_dup 5)] UNSPEC_TIE)) (set (match_operand 0 "register_operand" "") (match_operand 1 "register_operand" ""))] "" @@ -12022,7 +12022,7 @@ [(set (match_dup 2) (match_operand 1 "memory_operand" "")) (set (match_dup 3) (match_dup 4)) (set (match_dup 5) (match_dup 2)) - (set (match_dup 6) (unspec:BLK [(match_dup 6)] UNSPEC_TIE)) + (set (mem:BLK (scratch)) (unspec:BLK [(match_dup 6)] UNSPEC_TIE)) (set (match_operand 0 "register_operand" "") (match_dup 3))] "" " @@ -15902,8 +15902,8 @@ ; These are to explain that changes to the stack pointer should ; not be moved over stores to stack memory. (define_insn "stack_tie" - [(set (match_operand:BLK 0 "memory_operand" "+m") - (unspec:BLK [(match_dup 0)] UNSPEC_TIE))] + [(set (mem:BLK (scratch)) + (unspec:BLK [(match_operand:BLK 0 "memory_operand" "+m")] UNSPEC_TIE))] "" "" [(set_attr "length" "0")])