diff mbox

i386: Don't use frame pointer without stack access

Message ID alpine.LSU.2.21.1708071531210.15613@wotan.suse.de
State New
Headers show

Commit Message

Michael Matz Aug. 7, 2017, 1:32 p.m. UTC
Hi,

On Mon, 7 Aug 2017, H.J. Lu wrote:

> >> This will break unwinders relying on frame pointers to exist on all
> >> functions, for which projects conciously forced a frame pointer with this
> >> option.  I don't think we can simply override user specified explicit
> >> wishes in this way, presumably they had a reason to use it.
> >
> > Hm... yes, you are right.
> >
> > HJ, please revert the patch.
> >
> 
> Is there a testcae?  I'd like to add it.

Trivial change of your first example, see below.

> [hjl@gnu-tools-1 pr81736]$ clang -S -O2 -fno-omit-frame-pointer x.i
> [hjl@gnu-tools-1 pr81736]$ cat x.s
[... no frame ...]
> [hjl@gnu-tools-1 pr81736]$
> 
> Does it mean clang is broken?

In my book, yes.


Ciao,
Michael.

Comments

Andreas Schwab Aug. 7, 2017, 1:38 p.m. UTC | #1
On Aug 07 2017, Michael Matz <matz@suse.de> wrote:

> +/* { dg-final { scan-assembler "%\[re\]bp" } } */

Please use {} for regexps.

Andreas.
H.J. Lu Aug. 7, 2017, 1:38 p.m. UTC | #2
On Mon, Aug 7, 2017 at 6:32 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 7 Aug 2017, H.J. Lu wrote:
>
>> >> This will break unwinders relying on frame pointers to exist on all
>> >> functions, for which projects conciously forced a frame pointer with this
>> >> option.  I don't think we can simply override user specified explicit
>> >> wishes in this way, presumably they had a reason to use it.
>> >
>> > Hm... yes, you are right.
>> >
>> > HJ, please revert the patch.
>> >
>>
>> Is there a testcae?  I'd like to add it.
>
> Trivial change of your first example, see below.
>
>> [hjl@gnu-tools-1 pr81736]$ clang -S -O2 -fno-omit-frame-pointer x.i
>> [hjl@gnu-tools-1 pr81736]$ cat x.s
> [... no frame ...]
>> [hjl@gnu-tools-1 pr81736]$
>>
>> Does it mean clang is broken?
>
> In my book, yes.

Does GCC do this for all targets or just x86?

>
> Ciao,
> Michael.
>
> Index: gcc/testsuite/gcc.target/i386/force-frame.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/force-frame.c (revision 0)
> +++ gcc/testsuite/gcc.target/i386/force-frame.c (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-omit-frame-pointer" } */
> +
> +int
> +#ifndef __x86_64__
> +__attribute__((regparm(3)))
> +#endif
> +foo (int i)
> +{
> +  return i;
> +}
> +
> +/* The user wants a frame pointer.  */
> +/* { dg-final { scan-assembler "%\[re\]bp" } } */

I am looking for a run-time test which breaks unwinder.
Michael Matz Aug. 7, 2017, 1:48 p.m. UTC | #3
Hi,

On Mon, 7 Aug 2017, H.J. Lu wrote:

> >> [hjl@gnu-tools-1 pr81736]$
> >>
> >> Does it mean clang is broken?
> >
> > In my book, yes.
> 
> Does GCC do this for all targets or just x86?

No idea.  If so I'd say those other targets are broken as well (as long as 
the concept of frame pointer makes sense on them, their ABI defines one 
but leaves it optional and something like an unwinder could make use of 
it).

> I am looking for a run-time test which breaks unwinder.

I don't have one handy.  Idea: make two threads, one endlessly looping in 
the "frame-less" function, the other causing a signal to the first thread, 
and the signal handler checking that unwinding up to caller of 
frame_less() is possible via %[er]bp chaining.


Ciao,
Michael.
Alexander Monakov Aug. 7, 2017, 2:06 p.m. UTC | #4
On Mon, 7 Aug 2017, Michael Matz wrote:
> > I am looking for a run-time test which breaks unwinder.
> 
> I don't have one handy.  Idea: make two threads, one endlessly looping in 
> the "frame-less" function, the other causing a signal to the first thread, 
> and the signal handler checking that unwinding up to caller of 
> frame_less() is possible via %[er]bp chaining.

You'd probably have to arrange frame_less modify %rbp, otherwise unwinding
might "appear to work" by virtue of %rbp being valid for the outer frame.

I think one specific, real-life use case that may be potentially hurt by
this change is using linux-perf with backtrace recording, for programs with
hot functions that don't otherwise access the stack (which is plausible for
leaf functions with hot loops).

Alexander
H.J. Lu Aug. 7, 2017, 3:39 p.m. UTC | #5
On Mon, Aug 7, 2017 at 7:06 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Mon, 7 Aug 2017, Michael Matz wrote:
>> > I am looking for a run-time test which breaks unwinder.
>>
>> I don't have one handy.  Idea: make two threads, one endlessly looping in
>> the "frame-less" function, the other causing a signal to the first thread,
>> and the signal handler checking that unwinding up to caller of
>> frame_less() is possible via %[er]bp chaining.
>
> You'd probably have to arrange frame_less modify %rbp, otherwise unwinding
> might "appear to work" by virtue of %rbp being valid for the outer frame.
>
> I think one specific, real-life use case that may be potentially hurt by
> this change is using linux-perf with backtrace recording, for programs with
> hot functions that don't otherwise access the stack (which is plausible for
> leaf functions with hot loops).
>
> Alexander

This code is very silly with very little benefit:

[hjl@gnu-6 tmp]$ cat x.c
extern void bar (void);

void
foo (void)
{
  bar ();
}
[hjl@gnu-6 tmp]$ gcc -fno-omit-frame-pointer x.c -S -O2
[hjl@gnu-6 tmp]$ cat x.s
.file "x.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
jmp bar
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 7.1.1 20170709 (Red Hat 7.1.1-4)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 tmp]$

When another compiler does this optimization, applications won't
expect

pushq %rbp
movq %rsp, %rbp
popq %rbp
jmp bar

When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
this optimization removes more than 730

pushq %rbp
movq %rsp, %rbp
popq %rbp

Can we apply this optimization when function body has less than 6
instructions, similar to ix86_pad_short_function?
Jakub Jelinek Aug. 7, 2017, 3:43 p.m. UTC | #6
On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
> this optimization removes more than 730
> 
> pushq %rbp
> movq %rsp, %rbp
> popq %rbp

If you don't want the frame pointer, why are you compiling with
-fno-omit-frame-pointer?  Are you going to add
-fforce-no-omit-frame-pointer or something similar so that people can
actually get what they are asking for?  This doesn't really make sense.
It is perfectly fine to omit frame pointer by default, when it isn't
required for something, but if the user asks for it, we shouldn't ignore his
request.

	Jakub
Arjan van de Ven Aug. 7, 2017, 4:06 p.m. UTC | #7
On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>> this optimization removes more than 730
>>
>> pushq %rbp
>> movq %rsp, %rbp
>> popq %rbp
>
> If you don't want the frame pointer, why are you compiling with
> -fno-omit-frame-pointer?  Are you going to add
> -fforce-no-omit-frame-pointer or something similar so that people can
> actually get what they are asking for?  This doesn't really make sense.
> It is perfectly fine to omit frame pointer by default, when it isn't
> required for something, but if the user asks for it, we shouldn't ignore his
> request.
>


wanting a framepointer is very nice and desired...
... but if the optimizer/ins scheduler moves instructions outside of the frame'd portion,
(it does it for cases like below as well), the value is already negative for these
functions that don't have stack use.

<MPIDU_Sched_are_pending@@Base>:
mov    all_schedules@@Base-0x38460,%rax
push   %rbp
mov    %rsp,%rbp
pop    %rbp
cmpq   $0x0,(%rax)
setne  %al
movzbl %al,%eax
retq

(gcc 7.1 compiling mpich with LTO)

specifically the push/mov/pop back to back makes no sense at all to me.
if there was meat before the pop, sure.
but when there isn't......
Michael Matz Aug. 7, 2017, 4:16 p.m. UTC | #8
Hi,

On Mon, 7 Aug 2017, Arjan van de Ven wrote:

> wanting a framepointer is very nice and desired...
> ... but if the optimizer/ins scheduler moves instructions outside of the
> frame'd portion,
> (it does it for cases like below as well), the value is already negative for
> these
> functions that don't have stack use.
> 
> <MPIDU_Sched_are_pending@@Base>:
> mov    all_schedules@@Base-0x38460,%rax
> push   %rbp
> mov    %rsp,%rbp
> pop    %rbp
> cmpq   $0x0,(%rax)
> setne  %al
> movzbl %al,%eax
> retq

Then don't compile with -fno-omit-frame-pointer.  You explicitely 
requested one, so why are you surprised to see one?

> specifically the push/mov/pop back to back makes no sense at all to me.
> if there was meat before the pop, sure.
> but when there isn't......

That can be fixed by making the pop a real scheduling barrier.  If we 
should do that I don't know (I'd lean towards not having to do that for 
loopless code).


Ciao,
Michael.
Arjan van de Ven Aug. 7, 2017, 4:19 p.m. UTC | #9
On 8/7/2017 9:16 AM, Michael Matz wrote:
> Hi,
>
> On Mon, 7 Aug 2017, Arjan van de Ven wrote:
>
>> wanting a framepointer is very nice and desired...
>> ... but if the optimizer/ins scheduler moves instructions outside of the
>> frame'd portion,
>> (it does it for cases like below as well), the value is already negative for
>> these
>> functions that don't have stack use.
>>
>> <MPIDU_Sched_are_pending@@Base>:
>> mov    all_schedules@@Base-0x38460,%rax
>> push   %rbp
>> mov    %rsp,%rbp
>> pop    %rbp
>> cmpq   $0x0,(%rax)
>> setne  %al
>> movzbl %al,%eax
>> retq
>
> Then don't compile with -fno-omit-frame-pointer.  You explicitely
> requested one, so why are you surprised to see one?

I'm not surprised to see one.
I'm surprised to see a useless one.

The "perf" benefit is real, and that's why I asked for one... but the reorder made it an expensive
3 instruction nop for all intents and purposes.
If the pop was just before the ret, sure. It's not.

Maybe a different angle would be for a peephole phase to just eliminate the useless (even for those
who do want a frame pointer) push/mov/pop
H.J. Lu Aug. 7, 2017, 4:21 p.m. UTC | #10
On Mon, Aug 7, 2017 at 9:19 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 8/7/2017 9:16 AM, Michael Matz wrote:
>>
>> Hi,
>>
>> On Mon, 7 Aug 2017, Arjan van de Ven wrote:
>>
>>> wanting a framepointer is very nice and desired...
>>> ... but if the optimizer/ins scheduler moves instructions outside of the
>>> frame'd portion,
>>> (it does it for cases like below as well), the value is already negative
>>> for
>>> these
>>> functions that don't have stack use.
>>>
>>> <MPIDU_Sched_are_pending@@Base>:
>>> mov    all_schedules@@Base-0x38460,%rax
>>> push   %rbp
>>> mov    %rsp,%rbp
>>> pop    %rbp
>>> cmpq   $0x0,(%rax)
>>> setne  %al
>>> movzbl %al,%eax
>>> retq
>>
>>
>> Then don't compile with -fno-omit-frame-pointer.  You explicitely
>> requested one, so why are you surprised to see one?
>
>
> I'm not surprised to see one.
> I'm surprised to see a useless one.
>
> The "perf" benefit is real, and that's why I asked for one... but the
> reorder made it an expensive
> 3 instruction nop for all intents and purposes.
> If the pop was just before the ret, sure. It's not.
>
> Maybe a different angle would be for a peephole phase to just eliminate the
> useless (even for those
> who do want a frame pointer) push/mov/pop
>

I have seen GCC generate

push %rbp
pop %rbp
Michael Matz Aug. 7, 2017, 4:28 p.m. UTC | #11
Hi,

On Mon, 7 Aug 2017, Arjan van de Ven wrote:

> I'm not surprised to see one.
> I'm surprised to see a useless one.
> 
> The "perf" benefit is real, and that's why I asked for one... but the reorder
> made it an expensive 3 instruction nop for all intents and purposes.
> If the pop was just before the ret, sure. It's not.

Okay, that seems a reasonable request.  But IMHO independend from the 
issue of simply ignoring -fno-omit-frame-pointer to which I object.

> Maybe a different angle would be for a peephole phase to just eliminate 
> the useless (even for those who do want a frame pointer) push/mov/pop

For instance.


Ciao,
Michael.
Uros Bizjak Aug. 7, 2017, 6:40 p.m. UTC | #12
On Mon, Aug 7, 2017 at 3:48 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 7 Aug 2017, H.J. Lu wrote:
>
>> >> [hjl@gnu-tools-1 pr81736]$
>> >>
>> >> Does it mean clang is broken?
>> >
>> > In my book, yes.
>>
>> Does GCC do this for all targets or just x86?
>
> No idea.  If so I'd say those other targets are broken as well (as long as
> the concept of frame pointer makes sense on them, their ABI defines one
> but leaves it optional and something like an unwinder could make use of
> it).
>
>> I am looking for a run-time test which breaks unwinder.
>
> I don't have one handy.  Idea: make two threads, one endlessly looping in
> the "frame-less" function, the other causing a signal to the first thread,
> and the signal handler checking that unwinding up to caller of
> frame_less() is possible via %[er]bp chaining.

OTOH, even if the function doesn't make its own frame, the %[er]bp is
still untouched and points to the calling function frame. So, I think
that even in that case unwinding should be unharmed.

Frame setupsthat HJ presented are really unneeded, so there is a clear benefit.

How about we keep the patch for a while and see if something indeed
breaks? Linux live patching looks quite fragile in this area.

Uros.
Richard Sandiford Aug. 7, 2017, 8:05 p.m. UTC | #13
Arjan van de Ven <arjan@linux.intel.com> writes:
> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>> this optimization removes more than 730
>>>
>>> pushq %rbp
>>> movq %rsp, %rbp
>>> popq %rbp
>>
>> If you don't want the frame pointer, why are you compiling with
>> -fno-omit-frame-pointer?  Are you going to add
>> -fforce-no-omit-frame-pointer or something similar so that people can
>> actually get what they are asking for?  This doesn't really make sense.
>> It is perfectly fine to omit frame pointer by default, when it isn't
>> required for something, but if the user asks for it, we shouldn't ignore his
>> request.
>>
>
>
> wanting a framepointer is very nice and desired...  ... but if the
> optimizer/ins scheduler moves instructions outside of the frame'd
> portion, (it does it for cases like below as well), the value is
> already negative for these functions that don't have stack use.
>
> <MPIDU_Sched_are_pending@@Base>:
> mov    all_schedules@@Base-0x38460,%rax
> push   %rbp
> mov    %rsp,%rbp
> pop    %rbp
> cmpq   $0x0,(%rax)
> setne  %al
> movzbl %al,%eax
> retq

Yeah, and it could be even weirder for big single-block functions.
I think GCC has been doing this kind of scheduling of prologue and
epilogue instructions for a while, so there hasn*t really been a
guarantee which parts of the function will have a new FP and which
will still have the old one.

Also, with an arbitrarily-picked host compiler (GCC 6.3.1), shrink-wrapping
kicks in when the following is compiled with -O3 -fno-omit-frame-pointer:

    void f (int *);
    void
    g (int *x)
    {
      for (int i = 0; i < 1000; ++i)
        x[i] += 1;
      if (x[0])
        {
          int temp;
          f (&temp);
        }
    }

so only the block with the call to f sets up FP.  The relatively
long-running loop runs with the caller's FP.

I hope we can go for a target-independent position that what HJ*s
patch does is OK...

Thanks,
Richard
diff mbox

Patch

Index: gcc/testsuite/gcc.target/i386/force-frame.c
===================================================================
--- gcc/testsuite/gcc.target/i386/force-frame.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/force-frame.c	(working copy)
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+int
+#ifndef __x86_64__
+__attribute__((regparm(3)))
+#endif
+foo (int i)
+{
+  return i;
+}
+
+/* The user wants a frame pointer.  */
+/* { dg-final { scan-assembler "%\[re\]bp" } } */