Message ID | alpine.LSU.2.21.1708071531210.15613@wotan.suse.de |
---|---|
State | New |
Headers | show |
On Aug 07 2017, Michael Matz <matz@suse.de> wrote:
> +/* { dg-final { scan-assembler "%\[re\]bp" } } */
Please use {} for regexps.
Andreas.
On Mon, Aug 7, 2017 at 6:32 AM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Mon, 7 Aug 2017, H.J. Lu wrote: > >> >> This will break unwinders relying on frame pointers to exist on all >> >> functions, for which projects conciously forced a frame pointer with this >> >> option. I don't think we can simply override user specified explicit >> >> wishes in this way, presumably they had a reason to use it. >> > >> > Hm... yes, you are right. >> > >> > HJ, please revert the patch. >> > >> >> Is there a testcae? I'd like to add it. > > Trivial change of your first example, see below. > >> [hjl@gnu-tools-1 pr81736]$ clang -S -O2 -fno-omit-frame-pointer x.i >> [hjl@gnu-tools-1 pr81736]$ cat x.s > [... no frame ...] >> [hjl@gnu-tools-1 pr81736]$ >> >> Does it mean clang is broken? > > In my book, yes. Does GCC do this for all targets or just x86? > > Ciao, > Michael. > > Index: gcc/testsuite/gcc.target/i386/force-frame.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/force-frame.c (revision 0) > +++ gcc/testsuite/gcc.target/i386/force-frame.c (working copy) > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-omit-frame-pointer" } */ > + > +int > +#ifndef __x86_64__ > +__attribute__((regparm(3))) > +#endif > +foo (int i) > +{ > + return i; > +} > + > +/* The user wants a frame pointer. */ > +/* { dg-final { scan-assembler "%\[re\]bp" } } */ I am looking for a run-time test which breaks unwinder.
Hi, On Mon, 7 Aug 2017, H.J. Lu wrote: > >> [hjl@gnu-tools-1 pr81736]$ > >> > >> Does it mean clang is broken? > > > > In my book, yes. > > Does GCC do this for all targets or just x86? No idea. If so I'd say those other targets are broken as well (as long as the concept of frame pointer makes sense on them, their ABI defines one but leaves it optional and something like an unwinder could make use of it). > I am looking for a run-time test which breaks unwinder. I don't have one handy. Idea: make two threads, one endlessly looping in the "frame-less" function, the other causing a signal to the first thread, and the signal handler checking that unwinding up to caller of frame_less() is possible via %[er]bp chaining. Ciao, Michael.
On Mon, 7 Aug 2017, Michael Matz wrote: > > I am looking for a run-time test which breaks unwinder. > > I don't have one handy. Idea: make two threads, one endlessly looping in > the "frame-less" function, the other causing a signal to the first thread, > and the signal handler checking that unwinding up to caller of > frame_less() is possible via %[er]bp chaining. You'd probably have to arrange frame_less modify %rbp, otherwise unwinding might "appear to work" by virtue of %rbp being valid for the outer frame. I think one specific, real-life use case that may be potentially hurt by this change is using linux-perf with backtrace recording, for programs with hot functions that don't otherwise access the stack (which is plausible for leaf functions with hot loops). Alexander
On Mon, Aug 7, 2017 at 7:06 AM, Alexander Monakov <amonakov@ispras.ru> wrote: > On Mon, 7 Aug 2017, Michael Matz wrote: >> > I am looking for a run-time test which breaks unwinder. >> >> I don't have one handy. Idea: make two threads, one endlessly looping in >> the "frame-less" function, the other causing a signal to the first thread, >> and the signal handler checking that unwinding up to caller of >> frame_less() is possible via %[er]bp chaining. > > You'd probably have to arrange frame_less modify %rbp, otherwise unwinding > might "appear to work" by virtue of %rbp being valid for the outer frame. > > I think one specific, real-life use case that may be potentially hurt by > this change is using linux-perf with backtrace recording, for programs with > hot functions that don't otherwise access the stack (which is plausible for > leaf functions with hot loops). > > Alexander This code is very silly with very little benefit: [hjl@gnu-6 tmp]$ cat x.c extern void bar (void); void foo (void) { bar (); } [hjl@gnu-6 tmp]$ gcc -fno-omit-frame-pointer x.c -S -O2 [hjl@gnu-6 tmp]$ cat x.s .file "x.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 popq %rbp .cfi_def_cfa 7, 8 jmp bar .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 7.1.1 20170709 (Red Hat 7.1.1-4)" .section .note.GNU-stack,"",@progbits [hjl@gnu-6 tmp]$ When another compiler does this optimization, applications won't expect pushq %rbp movq %rsp, %rbp popq %rbp jmp bar When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer. this optimization removes more than 730 pushq %rbp movq %rsp, %rbp popq %rbp Can we apply this optimization when function body has less than 6 instructions, similar to ix86_pad_short_function?
On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote: > When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer. > this optimization removes more than 730 > > pushq %rbp > movq %rsp, %rbp > popq %rbp If you don't want the frame pointer, why are you compiling with -fno-omit-frame-pointer? Are you going to add -fforce-no-omit-frame-pointer or something similar so that people can actually get what they are asking for? This doesn't really make sense. It is perfectly fine to omit frame pointer by default, when it isn't required for something, but if the user asks for it, we shouldn't ignore his request. Jakub
On 8/7/2017 8:43 AM, Jakub Jelinek wrote: > On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote: >> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer. >> this optimization removes more than 730 >> >> pushq %rbp >> movq %rsp, %rbp >> popq %rbp > > If you don't want the frame pointer, why are you compiling with > -fno-omit-frame-pointer? Are you going to add > -fforce-no-omit-frame-pointer or something similar so that people can > actually get what they are asking for? This doesn't really make sense. > It is perfectly fine to omit frame pointer by default, when it isn't > required for something, but if the user asks for it, we shouldn't ignore his > request. > wanting a framepointer is very nice and desired... ... but if the optimizer/ins scheduler moves instructions outside of the frame'd portion, (it does it for cases like below as well), the value is already negative for these functions that don't have stack use. <MPIDU_Sched_are_pending@@Base>: mov all_schedules@@Base-0x38460,%rax push %rbp mov %rsp,%rbp pop %rbp cmpq $0x0,(%rax) setne %al movzbl %al,%eax retq (gcc 7.1 compiling mpich with LTO) specifically the push/mov/pop back to back makes no sense at all to me. if there was meat before the pop, sure. but when there isn't......
Hi, On Mon, 7 Aug 2017, Arjan van de Ven wrote: > wanting a framepointer is very nice and desired... > ... but if the optimizer/ins scheduler moves instructions outside of the > frame'd portion, > (it does it for cases like below as well), the value is already negative for > these > functions that don't have stack use. > > <MPIDU_Sched_are_pending@@Base>: > mov all_schedules@@Base-0x38460,%rax > push %rbp > mov %rsp,%rbp > pop %rbp > cmpq $0x0,(%rax) > setne %al > movzbl %al,%eax > retq Then don't compile with -fno-omit-frame-pointer. You explicitely requested one, so why are you surprised to see one? > specifically the push/mov/pop back to back makes no sense at all to me. > if there was meat before the pop, sure. > but when there isn't...... That can be fixed by making the pop a real scheduling barrier. If we should do that I don't know (I'd lean towards not having to do that for loopless code). Ciao, Michael.
On 8/7/2017 9:16 AM, Michael Matz wrote: > Hi, > > On Mon, 7 Aug 2017, Arjan van de Ven wrote: > >> wanting a framepointer is very nice and desired... >> ... but if the optimizer/ins scheduler moves instructions outside of the >> frame'd portion, >> (it does it for cases like below as well), the value is already negative for >> these >> functions that don't have stack use. >> >> <MPIDU_Sched_are_pending@@Base>: >> mov all_schedules@@Base-0x38460,%rax >> push %rbp >> mov %rsp,%rbp >> pop %rbp >> cmpq $0x0,(%rax) >> setne %al >> movzbl %al,%eax >> retq > > Then don't compile with -fno-omit-frame-pointer. You explicitely > requested one, so why are you surprised to see one? I'm not surprised to see one. I'm surprised to see a useless one. The "perf" benefit is real, and that's why I asked for one... but the reorder made it an expensive 3 instruction nop for all intents and purposes. If the pop was just before the ret, sure. It's not. Maybe a different angle would be for a peephole phase to just eliminate the useless (even for those who do want a frame pointer) push/mov/pop
On Mon, Aug 7, 2017 at 9:19 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 8/7/2017 9:16 AM, Michael Matz wrote: >> >> Hi, >> >> On Mon, 7 Aug 2017, Arjan van de Ven wrote: >> >>> wanting a framepointer is very nice and desired... >>> ... but if the optimizer/ins scheduler moves instructions outside of the >>> frame'd portion, >>> (it does it for cases like below as well), the value is already negative >>> for >>> these >>> functions that don't have stack use. >>> >>> <MPIDU_Sched_are_pending@@Base>: >>> mov all_schedules@@Base-0x38460,%rax >>> push %rbp >>> mov %rsp,%rbp >>> pop %rbp >>> cmpq $0x0,(%rax) >>> setne %al >>> movzbl %al,%eax >>> retq >> >> >> Then don't compile with -fno-omit-frame-pointer. You explicitely >> requested one, so why are you surprised to see one? > > > I'm not surprised to see one. > I'm surprised to see a useless one. > > The "perf" benefit is real, and that's why I asked for one... but the > reorder made it an expensive > 3 instruction nop for all intents and purposes. > If the pop was just before the ret, sure. It's not. > > Maybe a different angle would be for a peephole phase to just eliminate the > useless (even for those > who do want a frame pointer) push/mov/pop > I have seen GCC generate push %rbp pop %rbp
Hi, On Mon, 7 Aug 2017, Arjan van de Ven wrote: > I'm not surprised to see one. > I'm surprised to see a useless one. > > The "perf" benefit is real, and that's why I asked for one... but the reorder > made it an expensive 3 instruction nop for all intents and purposes. > If the pop was just before the ret, sure. It's not. Okay, that seems a reasonable request. But IMHO independend from the issue of simply ignoring -fno-omit-frame-pointer to which I object. > Maybe a different angle would be for a peephole phase to just eliminate > the useless (even for those who do want a frame pointer) push/mov/pop For instance. Ciao, Michael.
On Mon, Aug 7, 2017 at 3:48 PM, Michael Matz <matz@suse.de> wrote: > Hi, > > On Mon, 7 Aug 2017, H.J. Lu wrote: > >> >> [hjl@gnu-tools-1 pr81736]$ >> >> >> >> Does it mean clang is broken? >> > >> > In my book, yes. >> >> Does GCC do this for all targets or just x86? > > No idea. If so I'd say those other targets are broken as well (as long as > the concept of frame pointer makes sense on them, their ABI defines one > but leaves it optional and something like an unwinder could make use of > it). > >> I am looking for a run-time test which breaks unwinder. > > I don't have one handy. Idea: make two threads, one endlessly looping in > the "frame-less" function, the other causing a signal to the first thread, > and the signal handler checking that unwinding up to caller of > frame_less() is possible via %[er]bp chaining. OTOH, even if the function doesn't make its own frame, the %[er]bp is still untouched and points to the calling function frame. So, I think that even in that case unwinding should be unharmed. Frame setupsthat HJ presented are really unneeded, so there is a clear benefit. How about we keep the patch for a while and see if something indeed breaks? Linux live patching looks quite fragile in this area. Uros.
Arjan van de Ven <arjan@linux.intel.com> writes: > On 8/7/2017 8:43 AM, Jakub Jelinek wrote: >> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote: >>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer. >>> this optimization removes more than 730 >>> >>> pushq %rbp >>> movq %rsp, %rbp >>> popq %rbp >> >> If you don't want the frame pointer, why are you compiling with >> -fno-omit-frame-pointer? Are you going to add >> -fforce-no-omit-frame-pointer or something similar so that people can >> actually get what they are asking for? This doesn't really make sense. >> It is perfectly fine to omit frame pointer by default, when it isn't >> required for something, but if the user asks for it, we shouldn't ignore his >> request. >> > > > wanting a framepointer is very nice and desired... ... but if the > optimizer/ins scheduler moves instructions outside of the frame'd > portion, (it does it for cases like below as well), the value is > already negative for these functions that don't have stack use. > > <MPIDU_Sched_are_pending@@Base>: > mov all_schedules@@Base-0x38460,%rax > push %rbp > mov %rsp,%rbp > pop %rbp > cmpq $0x0,(%rax) > setne %al > movzbl %al,%eax > retq Yeah, and it could be even weirder for big single-block functions. I think GCC has been doing this kind of scheduling of prologue and epilogue instructions for a while, so there hasn*t really been a guarantee which parts of the function will have a new FP and which will still have the old one. Also, with an arbitrarily-picked host compiler (GCC 6.3.1), shrink-wrapping kicks in when the following is compiled with -O3 -fno-omit-frame-pointer: void f (int *); void g (int *x) { for (int i = 0; i < 1000; ++i) x[i] += 1; if (x[0]) { int temp; f (&temp); } } so only the block with the call to f sets up FP. The relatively long-running loop runs with the caller's FP. I hope we can go for a target-independent position that what HJ*s patch does is OK... Thanks, Richard
Index: gcc/testsuite/gcc.target/i386/force-frame.c =================================================================== --- gcc/testsuite/gcc.target/i386/force-frame.c (revision 0) +++ gcc/testsuite/gcc.target/i386/force-frame.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer" } */ + +int +#ifndef __x86_64__ +__attribute__((regparm(3))) +#endif +foo (int i) +{ + return i; +} + +/* The user wants a frame pointer. */ +/* { dg-final { scan-assembler "%\[re\]bp" } } */