diff mbox

i386: Don't use frame pointer without stack access

Message ID CAMe9rOphsdCFbXG63_Qsvdm5Rq3NDzoXLBzJ-vRWRZXMwSQzVQ@mail.gmail.com
State New
Headers show

Commit Message

H.J. Lu Aug. 8, 2017, 4:38 p.m. UTC
On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Arjan van de Ven <arjan@linux.intel.com> writes:
>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>> this optimization removes more than 730
>>>>
>>>> pushq %rbp
>>>> movq %rsp, %rbp
>>>> popq %rbp
>>>
>>> If you don't want the frame pointer, why are you compiling with
>>> -fno-omit-frame-pointer?  Are you going to add
>>> -fforce-no-omit-frame-pointer or something similar so that people can
>>> actually get what they are asking for?  This doesn't really make sense.
>>> It is perfectly fine to omit frame pointer by default, when it isn't
>>> required for something, but if the user asks for it, we shouldn't ignore his
>>> request.
>>>
>>
>>
>> wanting a framepointer is very nice and desired...  ... but if the
>> optimizer/ins scheduler moves instructions outside of the frame'd
>> portion, (it does it for cases like below as well), the value is
>> already negative for these functions that don't have stack use.
>>
>> <MPIDU_Sched_are_pending@@Base>:
>> mov    all_schedules@@Base-0x38460,%rax
>> push   %rbp
>> mov    %rsp,%rbp
>> pop    %rbp
>> cmpq   $0x0,(%rax)
>> setne  %al
>> movzbl %al,%eax
>> retq
>
> Yeah, and it could be even weirder for big single-block functions.
> I think GCC has been doing this kind of scheduling of prologue and
> epilogue instructions for a while, so there hasn*t really been a
> guarantee which parts of the function will have a new FP and which
> will still have the old one.
>
> Also, with an arbitrarily-picked host compiler (GCC 6.3.1), shrink-wrapping
> kicks in when the following is compiled with -O3 -fno-omit-frame-pointer:
>
>     void f (int *);
>     void
>     g (int *x)
>     {
>       for (int i = 0; i < 1000; ++i)
>         x[i] += 1;
>       if (x[0])
>         {
>           int temp;
>           f (&temp);
>         }
>     }
>
> so only the block with the call to f sets up FP.  The relatively
> long-running loop runs with the caller's FP.
>
> I hope we can go for a target-independent position that what HJ*s
> patch does is OK...
>

In light of this,  I am resubmitting my patch.  I added 3 more testcases
and also handle:

typedef int v8si __attribute__ ((vector_size (32)));

void
foo (v8si *out_start, v8si *out_end, v8si *regions)
{
    v8si base = regions[3];
    *out_start = base;
    *out_end = base;
}

OK for trunk?

Thanks.

Comments

Richard Biener Aug. 8, 2017, 5 p.m. UTC | #1
On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
><richard.sandiford@linaro.org> wrote:
>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>>> this optimization removes more than 730
>>>>>
>>>>> pushq %rbp
>>>>> movq %rsp, %rbp
>>>>> popq %rbp
>>>>
>>>> If you don't want the frame pointer, why are you compiling with
>>>> -fno-omit-frame-pointer?  Are you going to add
>>>> -fforce-no-omit-frame-pointer or something similar so that people
>can
>>>> actually get what they are asking for?  This doesn't really make
>sense.
>>>> It is perfectly fine to omit frame pointer by default, when it
>isn't
>>>> required for something, but if the user asks for it, we shouldn't
>ignore his
>>>> request.
>>>>
>>>
>>>
>>> wanting a framepointer is very nice and desired...  ... but if the
>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>> portion, (it does it for cases like below as well), the value is
>>> already negative for these functions that don't have stack use.
>>>
>>> <MPIDU_Sched_are_pending@@Base>:
>>> mov    all_schedules@@Base-0x38460,%rax
>>> push   %rbp
>>> mov    %rsp,%rbp
>>> pop    %rbp
>>> cmpq   $0x0,(%rax)
>>> setne  %al
>>> movzbl %al,%eax
>>> retq
>>
>> Yeah, and it could be even weirder for big single-block functions.
>> I think GCC has been doing this kind of scheduling of prologue and
>> epilogue instructions for a while, so there hasn*t really been a
>> guarantee which parts of the function will have a new FP and which
>> will still have the old one.
>>
>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>shrink-wrapping
>> kicks in when the following is compiled with -O3
>-fno-omit-frame-pointer:
>>
>>     void f (int *);
>>     void
>>     g (int *x)
>>     {
>>       for (int i = 0; i < 1000; ++i)
>>         x[i] += 1;
>>       if (x[0])
>>         {
>>           int temp;
>>           f (&temp);
>>         }
>>     }
>>
>> so only the block with the call to f sets up FP.  The relatively
>> long-running loop runs with the caller's FP.
>>
>> I hope we can go for a target-independent position that what HJ*s
>> patch does is OK...
>>
>
>In light of this,  I am resubmitting my patch.  I added 3 more
>testcases
>and also handle:
>
>typedef int v8si __attribute__ ((vector_size (32)));
>
>void
>foo (v8si *out_start, v8si *out_end, v8si *regions)
>{
>    v8si base = regions[3];
>    *out_start = base;
>    *out_end = base;
>}
>
>OK for trunk?

The invoker specified -fno-omit-frame-pointer, why did you eliminate it?  I'd argue it's OK when neither -f nor -fno- is explicitly specified irrespective of the default in case we document the change but an explicit -fno- is pretty clear.

And yes, unfortunate placement of frame pointer init/de-init should be fixed.

Richard.

>
>Thanks.
Uros Bizjak Aug. 8, 2017, 5:05 p.m. UTC | #2
On Tue, Aug 8, 2017 at 6:38 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>>> this optimization removes more than 730
>>>>>
>>>>> pushq %rbp
>>>>> movq %rsp, %rbp
>>>>> popq %rbp
>>>>
>>>> If you don't want the frame pointer, why are you compiling with
>>>> -fno-omit-frame-pointer?  Are you going to add
>>>> -fforce-no-omit-frame-pointer or something similar so that people can
>>>> actually get what they are asking for?  This doesn't really make sense.
>>>> It is perfectly fine to omit frame pointer by default, when it isn't
>>>> required for something, but if the user asks for it, we shouldn't ignore his
>>>> request.
>>>>
>>>
>>>
>>> wanting a framepointer is very nice and desired...  ... but if the
>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>> portion, (it does it for cases like below as well), the value is
>>> already negative for these functions that don't have stack use.
>>>
>>> <MPIDU_Sched_are_pending@@Base>:
>>> mov    all_schedules@@Base-0x38460,%rax
>>> push   %rbp
>>> mov    %rsp,%rbp
>>> pop    %rbp
>>> cmpq   $0x0,(%rax)
>>> setne  %al
>>> movzbl %al,%eax
>>> retq
>>
>> Yeah, and it could be even weirder for big single-block functions.
>> I think GCC has been doing this kind of scheduling of prologue and
>> epilogue instructions for a while, so there hasn*t really been a
>> guarantee which parts of the function will have a new FP and which
>> will still have the old one.
>>
>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1), shrink-wrapping
>> kicks in when the following is compiled with -O3 -fno-omit-frame-pointer:
>>
>>     void f (int *);
>>     void
>>     g (int *x)
>>     {
>>       for (int i = 0; i < 1000; ++i)
>>         x[i] += 1;
>>       if (x[0])
>>         {
>>           int temp;
>>           f (&temp);
>>         }
>>     }
>>
>> so only the block with the call to f sets up FP.  The relatively
>> long-running loop runs with the caller's FP.
>>
>> I hope we can go for a target-independent position that what HJ*s
>> patch does is OK...
>>
>
> In light of this,  I am resubmitting my patch.  I added 3 more testcases
> and also handle:
>
> typedef int v8si __attribute__ ((vector_size (32)));
>
> void
> foo (v8si *out_start, v8si *out_end, v8si *regions)
> {
>     v8si base = regions[3];
>     *out_start = base;
>     *out_end = base;
> }
>
> OK for trunk?

I think that the patch doesn't worsen the situation with FP debugging,
a couple of cases were presented where function operates on the caller
frame. Let's wait a bit for a counter-examples, where the patch hurts
debugging. IMO, the patch is the way to go, as shrink-wrapping is more
toxic than presented patch.

Uros.
Richard Sandiford Aug. 8, 2017, 5:34 p.m. UTC | #3
Richard Biener <richard.guenther@gmail.com> writes:
> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>><richard.sandiford@linaro.org> wrote:
>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>>>> this optimization removes more than 730
>>>>>>
>>>>>> pushq %rbp
>>>>>> movq %rsp, %rbp
>>>>>> popq %rbp
>>>>>
>>>>> If you don't want the frame pointer, why are you compiling with
>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>> -fforce-no-omit-frame-pointer or something similar so that people
>>can
>>>>> actually get what they are asking for?  This doesn't really make
>>sense.
>>>>> It is perfectly fine to omit frame pointer by default, when it
>>isn't
>>>>> required for something, but if the user asks for it, we shouldn't
>>ignore his
>>>>> request.
>>>>>
>>>>
>>>>
>>>> wanting a framepointer is very nice and desired...  ... but if the
>>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>>> portion, (it does it for cases like below as well), the value is
>>>> already negative for these functions that don't have stack use.
>>>>
>>>> <MPIDU_Sched_are_pending@@Base>:
>>>> mov    all_schedules@@Base-0x38460,%rax
>>>> push   %rbp
>>>> mov    %rsp,%rbp
>>>> pop    %rbp
>>>> cmpq   $0x0,(%rax)
>>>> setne  %al
>>>> movzbl %al,%eax
>>>> retq
>>>
>>> Yeah, and it could be even weirder for big single-block functions.
>>> I think GCC has been doing this kind of scheduling of prologue and
>>> epilogue instructions for a while, so there hasn*t really been a
>>> guarantee which parts of the function will have a new FP and which
>>> will still have the old one.
>>>
>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>shrink-wrapping
>>> kicks in when the following is compiled with -O3
>>-fno-omit-frame-pointer:
>>>
>>>     void f (int *);
>>>     void
>>>     g (int *x)
>>>     {
>>>       for (int i = 0; i < 1000; ++i)
>>>         x[i] += 1;
>>>       if (x[0])
>>>         {
>>>           int temp;
>>>           f (&temp);
>>>         }
>>>     }
>>>
>>> so only the block with the call to f sets up FP.  The relatively
>>> long-running loop runs with the caller's FP.
>>>
>>> I hope we can go for a target-independent position that what HJ*s
>>> patch does is OK...
>>>
>>
>>In light of this,  I am resubmitting my patch.  I added 3 more
>>testcases
>>and also handle:
>>
>>typedef int v8si __attribute__ ((vector_size (32)));
>>
>>void
>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>{
>>    v8si base = regions[3];
>>    *out_start = base;
>>    *out_end = base;
>>}
>>
>>OK for trunk?
>
> The invoker specified -fno-omit-frame-pointer, why did you eliminate it?
> I'd argue it's OK when neither -f nor -fno- is explicitly specified
> irrespective of the default in case we document the change but an
> explicit -fno- is pretty clear.

I don't buy that we're ignoring the user.  -fomit-frame-pointer says
that, when you're creating a frame, it's OK not to set up the frame
pointer.  Forcing it off means that if you create a frame, you need
to set up the frame pointer too.  But it doesn't say anything about
whether the frame itself is needed.  I.e. it's -fno-omit-frame*-pointer*
rather than -fno-omit-frame.

It seems like the responses have been treating it more like
a combination of:

-fno-shrink-wrapping
-fno-omit-frame-pointer
the equivalent of the old textual prologues and epilogues

but the positive option -fomit-frame-pointer doesn't have any effect
on the last two.

Thanks,
Richard
Richard Sandiford Aug. 8, 2017, 5:36 p.m. UTC | #4
Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu" <hjl.tools@gmail.com> wrote:
>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>><richard.sandiford@linaro.org> wrote:
>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>> When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
>>>>>>> this optimization removes more than 730
>>>>>>>
>>>>>>> pushq %rbp
>>>>>>> movq %rsp, %rbp
>>>>>>> popq %rbp
>>>>>>
>>>>>> If you don't want the frame pointer, why are you compiling with
>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>> -fforce-no-omit-frame-pointer or something similar so that people
>>>can
>>>>>> actually get what they are asking for?  This doesn't really make
>>>sense.
>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>isn't
>>>>>> required for something, but if the user asks for it, we shouldn't
>>>ignore his
>>>>>> request.
>>>>>>
>>>>>
>>>>>
>>>>> wanting a framepointer is very nice and desired...  ... but if the
>>>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>>>> portion, (it does it for cases like below as well), the value is
>>>>> already negative for these functions that don't have stack use.
>>>>>
>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>> push   %rbp
>>>>> mov    %rsp,%rbp
>>>>> pop    %rbp
>>>>> cmpq   $0x0,(%rax)
>>>>> setne  %al
>>>>> movzbl %al,%eax
>>>>> retq
>>>>
>>>> Yeah, and it could be even weirder for big single-block functions.
>>>> I think GCC has been doing this kind of scheduling of prologue and
>>>> epilogue instructions for a while, so there hasn*t really been a
>>>> guarantee which parts of the function will have a new FP and which
>>>> will still have the old one.
>>>>
>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>shrink-wrapping
>>>> kicks in when the following is compiled with -O3
>>>-fno-omit-frame-pointer:
>>>>
>>>>     void f (int *);
>>>>     void
>>>>     g (int *x)
>>>>     {
>>>>       for (int i = 0; i < 1000; ++i)
>>>>         x[i] += 1;
>>>>       if (x[0])
>>>>         {
>>>>           int temp;
>>>>           f (&temp);
>>>>         }
>>>>     }
>>>>
>>>> so only the block with the call to f sets up FP.  The relatively
>>>> long-running loop runs with the caller's FP.
>>>>
>>>> I hope we can go for a target-independent position that what HJ*s
>>>> patch does is OK...
>>>>
>>>
>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>testcases
>>>and also handle:
>>>
>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>
>>>void
>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>{
>>>    v8si base = regions[3];
>>>    *out_start = base;
>>>    *out_end = base;
>>>}
>>>
>>>OK for trunk?
>>
>> The invoker specified -fno-omit-frame-pointer, why did you eliminate it?
>> I'd argue it's OK when neither -f nor -fno- is explicitly specified
>> irrespective of the default in case we document the change but an
>> explicit -fno- is pretty clear.
>
> I don't buy that we're ignoring the user.  -fomit-frame-pointer says
> that, when you're creating a frame, it's OK not to set up the frame
> pointer.  Forcing it off means that if you create a frame, you need
> to set up the frame pointer too.  But it doesn't say anything about
> whether the frame itself is needed.  I.e. it's -fno-omit-frame*-pointer*
> rather than -fno-omit-frame.
>
> It seems like the responses have been treating it more like
> a combination of:
>
> -fno-shrink-wrapping
> -fno-omit-frame-pointer
> the equivalent of the old textual prologues and epilogues
>
> but the positive option -fomit-frame-pointer doesn't have any effect
> on the last two.

er, you know what I mean :-)  It doesn't have any effect on
-fshrink-wrapping or the textual-style prologues and epilogues.
Richard Biener Aug. 8, 2017, 6 p.m. UTC | #5
On August 8, 2017 7:36:35 PM GMT+02:00, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu"
><hjl.tools@gmail.com> wrote:
>>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>>><richard.sandiford@linaro.org> wrote:
>>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>>> When Linux/x86-64 kernel is compiled with
>-fno-omit-frame-pointer.
>>>>>>>> this optimization removes more than 730
>>>>>>>>
>>>>>>>> pushq %rbp
>>>>>>>> movq %rsp, %rbp
>>>>>>>> popq %rbp
>>>>>>>
>>>>>>> If you don't want the frame pointer, why are you compiling with
>>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>>> -fforce-no-omit-frame-pointer or something similar so that
>people
>>>>can
>>>>>>> actually get what they are asking for?  This doesn't really make
>>>>sense.
>>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>>isn't
>>>>>>> required for something, but if the user asks for it, we
>shouldn't
>>>>ignore his
>>>>>>> request.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> wanting a framepointer is very nice and desired...  ... but if
>the
>>>>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>>>>> portion, (it does it for cases like below as well), the value is
>>>>>> already negative for these functions that don't have stack use.
>>>>>>
>>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>>> push   %rbp
>>>>>> mov    %rsp,%rbp
>>>>>> pop    %rbp
>>>>>> cmpq   $0x0,(%rax)
>>>>>> setne  %al
>>>>>> movzbl %al,%eax
>>>>>> retq
>>>>>
>>>>> Yeah, and it could be even weirder for big single-block functions.
>>>>> I think GCC has been doing this kind of scheduling of prologue and
>>>>> epilogue instructions for a while, so there hasn*t really been a
>>>>> guarantee which parts of the function will have a new FP and which
>>>>> will still have the old one.
>>>>>
>>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>>shrink-wrapping
>>>>> kicks in when the following is compiled with -O3
>>>>-fno-omit-frame-pointer:
>>>>>
>>>>>     void f (int *);
>>>>>     void
>>>>>     g (int *x)
>>>>>     {
>>>>>       for (int i = 0; i < 1000; ++i)
>>>>>         x[i] += 1;
>>>>>       if (x[0])
>>>>>         {
>>>>>           int temp;
>>>>>           f (&temp);
>>>>>         }
>>>>>     }
>>>>>
>>>>> so only the block with the call to f sets up FP.  The relatively
>>>>> long-running loop runs with the caller's FP.
>>>>>
>>>>> I hope we can go for a target-independent position that what HJ*s
>>>>> patch does is OK...
>>>>>
>>>>
>>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>>testcases
>>>>and also handle:
>>>>
>>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>>
>>>>void
>>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>>{
>>>>    v8si base = regions[3];
>>>>    *out_start = base;
>>>>    *out_end = base;
>>>>}
>>>>
>>>>OK for trunk?
>>>
>>> The invoker specified -fno-omit-frame-pointer, why did you eliminate
>it?
>>> I'd argue it's OK when neither -f nor -fno- is explicitly specified
>>> irrespective of the default in case we document the change but an
>>> explicit -fno- is pretty clear.
>>
>> I don't buy that we're ignoring the user.  -fomit-frame-pointer says
>> that, when you're creating a frame, it's OK not to set up the frame
>> pointer.  Forcing it off means that if you create a frame, you need
>> to set up the frame pointer too.  But it doesn't say anything about
>> whether the frame itself is needed.  I.e. it's
>-fno-omit-frame*-pointer*
>> rather than -fno-omit-frame.

Isn't that a bit splitting hairs if you look at (past) history?

You could also interpret -fno-omit-frame-pointer as obviously forcing a frame as otherwise there's nothing to omit...

>> It seems like the responses have been treating it more like
>> a combination of:
>>
>> -fno-shrink-wrapping
>> -fno-omit-frame-pointer
>> the equivalent of the old textual prologues and epilogues
>>
>> but the positive option -fomit-frame-pointer doesn't have any effect
>> on the last two.
>
>er, you know what I mean :-)  It doesn't have any effect on
>-fshrink-wrapping or the textual-style prologues and epilogues.

True.  But I think people do not appreciate new options too much if existing ones worked in the past...

Richard.
H.J. Lu Aug. 8, 2017, 6:29 p.m. UTC | #6
On Tue, Aug 8, 2017 at 11:00 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On August 8, 2017 7:36:35 PM GMT+02:00, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>>Richard Sandiford <richard.sandiford@linaro.org> writes:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu"
>><hjl.tools@gmail.com> wrote:
>>>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>>>><richard.sandiford@linaro.org> wrote:
>>>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>>>> When Linux/x86-64 kernel is compiled with
>>-fno-omit-frame-pointer.
>>>>>>>>> this optimization removes more than 730
>>>>>>>>>
>>>>>>>>> pushq %rbp
>>>>>>>>> movq %rsp, %rbp
>>>>>>>>> popq %rbp
>>>>>>>>
>>>>>>>> If you don't want the frame pointer, why are you compiling with
>>>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>>>> -fforce-no-omit-frame-pointer or something similar so that
>>people
>>>>>can
>>>>>>>> actually get what they are asking for?  This doesn't really make
>>>>>sense.
>>>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>>>isn't
>>>>>>>> required for something, but if the user asks for it, we
>>shouldn't
>>>>>ignore his
>>>>>>>> request.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> wanting a framepointer is very nice and desired...  ... but if
>>the
>>>>>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>>>>>> portion, (it does it for cases like below as well), the value is
>>>>>>> already negative for these functions that don't have stack use.
>>>>>>>
>>>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>>>> push   %rbp
>>>>>>> mov    %rsp,%rbp
>>>>>>> pop    %rbp
>>>>>>> cmpq   $0x0,(%rax)
>>>>>>> setne  %al
>>>>>>> movzbl %al,%eax
>>>>>>> retq
>>>>>>
>>>>>> Yeah, and it could be even weirder for big single-block functions.
>>>>>> I think GCC has been doing this kind of scheduling of prologue and
>>>>>> epilogue instructions for a while, so there hasn*t really been a
>>>>>> guarantee which parts of the function will have a new FP and which
>>>>>> will still have the old one.
>>>>>>
>>>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>>>shrink-wrapping
>>>>>> kicks in when the following is compiled with -O3
>>>>>-fno-omit-frame-pointer:
>>>>>>
>>>>>>     void f (int *);
>>>>>>     void
>>>>>>     g (int *x)
>>>>>>     {
>>>>>>       for (int i = 0; i < 1000; ++i)
>>>>>>         x[i] += 1;
>>>>>>       if (x[0])
>>>>>>         {
>>>>>>           int temp;
>>>>>>           f (&temp);
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>> so only the block with the call to f sets up FP.  The relatively
>>>>>> long-running loop runs with the caller's FP.
>>>>>>
>>>>>> I hope we can go for a target-independent position that what HJ*s
>>>>>> patch does is OK...
>>>>>>
>>>>>
>>>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>>>testcases
>>>>>and also handle:
>>>>>
>>>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>>>
>>>>>void
>>>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>>>{
>>>>>    v8si base = regions[3];
>>>>>    *out_start = base;
>>>>>    *out_end = base;
>>>>>}
>>>>>
>>>>>OK for trunk?
>>>>
>>>> The invoker specified -fno-omit-frame-pointer, why did you eliminate
>>it?
>>>> I'd argue it's OK when neither -f nor -fno- is explicitly specified
>>>> irrespective of the default in case we document the change but an
>>>> explicit -fno- is pretty clear.
>>>
>>> I don't buy that we're ignoring the user.  -fomit-frame-pointer says
>>> that, when you're creating a frame, it's OK not to set up the frame
>>> pointer.  Forcing it off means that if you create a frame, you need
>>> to set up the frame pointer too.  But it doesn't say anything about
>>> whether the frame itself is needed.  I.e. it's
>>-fno-omit-frame*-pointer*
>>> rather than -fno-omit-frame.
>
> Isn't that a bit splitting hairs if you look at (past) history?
>
> You could also interpret -fno-omit-frame-pointer as obviously forcing a frame as otherwise there's nothing to omit...
>
>>> It seems like the responses have been treating it more like
>>> a combination of:
>>>
>>> -fno-shrink-wrapping
>>> -fno-omit-frame-pointer
>>> the equivalent of the old textual prologues and epilogues
>>>
>>> but the positive option -fomit-frame-pointer doesn't have any effect
>>> on the last two.
>>
>>er, you know what I mean :-)  It doesn't have any effect on
>>-fshrink-wrapping or the textual-style prologues and epilogues.
>
> True.  But I think people do not appreciate new options too much if existing ones worked in the past...
>

Should we also disable LTO and function inlining with -fno-omit-frame-pointer?
Both of them may eliminate frame pointer.
Richard Sandiford Aug. 9, 2017, 7:53 a.m. UTC | #7
Richard Biener <richard.guenther@gmail.com> writes:
> On August 8, 2017 7:36:35 PM GMT+02:00, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>>Richard Sandiford <richard.sandiford@linaro.org> writes:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu"
>><hjl.tools@gmail.com> wrote:
>>>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>>>><richard.sandiford@linaro.org> wrote:
>>>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>>>> When Linux/x86-64 kernel is compiled with
>>-fno-omit-frame-pointer.
>>>>>>>>> this optimization removes more than 730
>>>>>>>>>
>>>>>>>>> pushq %rbp
>>>>>>>>> movq %rsp, %rbp
>>>>>>>>> popq %rbp
>>>>>>>>
>>>>>>>> If you don't want the frame pointer, why are you compiling with
>>>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>>>> -fforce-no-omit-frame-pointer or something similar so that
>>people
>>>>>can
>>>>>>>> actually get what they are asking for?  This doesn't really make
>>>>>sense.
>>>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>>>isn't
>>>>>>>> required for something, but if the user asks for it, we
>>shouldn't
>>>>>ignore his
>>>>>>>> request.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> wanting a framepointer is very nice and desired...  ... but if
>>the
>>>>>>> optimizer/ins scheduler moves instructions outside of the frame'd
>>>>>>> portion, (it does it for cases like below as well), the value is
>>>>>>> already negative for these functions that don't have stack use.
>>>>>>>
>>>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>>>> push   %rbp
>>>>>>> mov    %rsp,%rbp
>>>>>>> pop    %rbp
>>>>>>> cmpq   $0x0,(%rax)
>>>>>>> setne  %al
>>>>>>> movzbl %al,%eax
>>>>>>> retq
>>>>>>
>>>>>> Yeah, and it could be even weirder for big single-block functions.
>>>>>> I think GCC has been doing this kind of scheduling of prologue and
>>>>>> epilogue instructions for a while, so there hasn*t really been a
>>>>>> guarantee which parts of the function will have a new FP and which
>>>>>> will still have the old one.
>>>>>>
>>>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>>>shrink-wrapping
>>>>>> kicks in when the following is compiled with -O3
>>>>>-fno-omit-frame-pointer:
>>>>>>
>>>>>>     void f (int *);
>>>>>>     void
>>>>>>     g (int *x)
>>>>>>     {
>>>>>>       for (int i = 0; i < 1000; ++i)
>>>>>>         x[i] += 1;
>>>>>>       if (x[0])
>>>>>>         {
>>>>>>           int temp;
>>>>>>           f (&temp);
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>> so only the block with the call to f sets up FP.  The relatively
>>>>>> long-running loop runs with the caller's FP.
>>>>>>
>>>>>> I hope we can go for a target-independent position that what HJ*s
>>>>>> patch does is OK...
>>>>>>
>>>>>
>>>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>>>testcases
>>>>>and also handle:
>>>>>
>>>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>>>
>>>>>void
>>>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>>>{
>>>>>    v8si base = regions[3];
>>>>>    *out_start = base;
>>>>>    *out_end = base;
>>>>>}
>>>>>
>>>>>OK for trunk?
>>>>
>>>> The invoker specified -fno-omit-frame-pointer, why did you eliminate
>>it?
>>>> I'd argue it's OK when neither -f nor -fno- is explicitly specified
>>>> irrespective of the default in case we document the change but an
>>>> explicit -fno- is pretty clear.
>>>
>>> I don't buy that we're ignoring the user.  -fomit-frame-pointer says
>>> that, when you're creating a frame, it's OK not to set up the frame
>>> pointer.  Forcing it off means that if you create a frame, you need
>>> to set up the frame pointer too.  But it doesn't say anything about
>>> whether the frame itself is needed.  I.e. it's
>>-fno-omit-frame*-pointer*
>>> rather than -fno-omit-frame.
>
> Isn't that a bit splitting hairs if you look at (past) history?

I guess it would have been splitting hairs in the days when they
amounted to the same thing, i.e. when there was no behaviour that
would match "-fomit-frame" and when the prologue and epilogue were
glued to the start and end of the function.  But that was quite a
long time ago.  Shrink-wrapping at least means that omitting the frame
and omitting the frame pointer are different things, and it seems
fair that -fomit-frame-pointer has followed the natural meaning.

> You could also interpret -fno-omit-frame-pointer as obviously forcing a
> frame as otherwise there's nothing to omit...

But applying that kind of interpretation to something like
-maccumulate-outgoing-args would make inlining all calls within a
function invalid, since there'd no longer be arguments to accumulate.

I think this kind of disagreement just emphasises that if we really
need a "always emit a prologue at the very start, an epilogue at the
very end, and always use a frame pointer" option, we should add it
and document exactly what the guarantees are.  I don't think
-fno-omit-frame-pointer should be it, since as the replies earlier in
the thread said, the natural meaning of that option has its uses too.

Thanks,
Richard

>
>>> It seems like the responses have been treating it more like
>>> a combination of:
>>>
>>> -fno-shrink-wrapping
>>> -fno-omit-frame-pointer
>>> the equivalent of the old textual prologues and epilogues
>>>
>>> but the positive option -fomit-frame-pointer doesn't have any effect
>>> on the last two.
>>
>>er, you know what I mean :-)  It doesn't have any effect on
>>-fshrink-wrapping or the textual-style prologues and epilogues.
>
> True.  But I think people do not appreciate new options too much if
> existing ones worked in the past...
>
> Richard.
Richard Biener Aug. 9, 2017, 11:22 a.m. UTC | #8
On August 9, 2017 9:53:05 AM GMT+02:00, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>Richard Biener <richard.guenther@gmail.com> writes:
>> On August 8, 2017 7:36:35 PM GMT+02:00, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>>Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu"
>>><hjl.tools@gmail.com> wrote:
>>>>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>>>>><richard.sandiford@linaro.org> wrote:
>>>>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>>>>> When Linux/x86-64 kernel is compiled with
>>>-fno-omit-frame-pointer.
>>>>>>>>>> this optimization removes more than 730
>>>>>>>>>>
>>>>>>>>>> pushq %rbp
>>>>>>>>>> movq %rsp, %rbp
>>>>>>>>>> popq %rbp
>>>>>>>>>
>>>>>>>>> If you don't want the frame pointer, why are you compiling
>with
>>>>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>>>>> -fforce-no-omit-frame-pointer or something similar so that
>>>people
>>>>>>can
>>>>>>>>> actually get what they are asking for?  This doesn't really
>make
>>>>>>sense.
>>>>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>>>>isn't
>>>>>>>>> required for something, but if the user asks for it, we
>>>shouldn't
>>>>>>ignore his
>>>>>>>>> request.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> wanting a framepointer is very nice and desired...  ... but if
>>>the
>>>>>>>> optimizer/ins scheduler moves instructions outside of the
>frame'd
>>>>>>>> portion, (it does it for cases like below as well), the value
>is
>>>>>>>> already negative for these functions that don't have stack use.
>>>>>>>>
>>>>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>>>>> push   %rbp
>>>>>>>> mov    %rsp,%rbp
>>>>>>>> pop    %rbp
>>>>>>>> cmpq   $0x0,(%rax)
>>>>>>>> setne  %al
>>>>>>>> movzbl %al,%eax
>>>>>>>> retq
>>>>>>>
>>>>>>> Yeah, and it could be even weirder for big single-block
>functions.
>>>>>>> I think GCC has been doing this kind of scheduling of prologue
>and
>>>>>>> epilogue instructions for a while, so there hasn*t really been a
>>>>>>> guarantee which parts of the function will have a new FP and
>which
>>>>>>> will still have the old one.
>>>>>>>
>>>>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>>>>shrink-wrapping
>>>>>>> kicks in when the following is compiled with -O3
>>>>>>-fno-omit-frame-pointer:
>>>>>>>
>>>>>>>     void f (int *);
>>>>>>>     void
>>>>>>>     g (int *x)
>>>>>>>     {
>>>>>>>       for (int i = 0; i < 1000; ++i)
>>>>>>>         x[i] += 1;
>>>>>>>       if (x[0])
>>>>>>>         {
>>>>>>>           int temp;
>>>>>>>           f (&temp);
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>> so only the block with the call to f sets up FP.  The relatively
>>>>>>> long-running loop runs with the caller's FP.
>>>>>>>
>>>>>>> I hope we can go for a target-independent position that what
>HJ*s
>>>>>>> patch does is OK...
>>>>>>>
>>>>>>
>>>>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>>>>testcases
>>>>>>and also handle:
>>>>>>
>>>>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>>>>
>>>>>>void
>>>>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>>>>{
>>>>>>    v8si base = regions[3];
>>>>>>    *out_start = base;
>>>>>>    *out_end = base;
>>>>>>}
>>>>>>
>>>>>>OK for trunk?
>>>>>
>>>>> The invoker specified -fno-omit-frame-pointer, why did you
>eliminate
>>>it?
>>>>> I'd argue it's OK when neither -f nor -fno- is explicitly
>specified
>>>>> irrespective of the default in case we document the change but an
>>>>> explicit -fno- is pretty clear.
>>>>
>>>> I don't buy that we're ignoring the user.  -fomit-frame-pointer
>says
>>>> that, when you're creating a frame, it's OK not to set up the frame
>>>> pointer.  Forcing it off means that if you create a frame, you need
>>>> to set up the frame pointer too.  But it doesn't say anything about
>>>> whether the frame itself is needed.  I.e. it's
>>>-fno-omit-frame*-pointer*
>>>> rather than -fno-omit-frame.
>>
>> Isn't that a bit splitting hairs if you look at (past) history?
>
>I guess it would have been splitting hairs in the days when they
>amounted to the same thing, i.e. when there was no behaviour that
>would match "-fomit-frame" and when the prologue and epilogue were
>glued to the start and end of the function.  But that was quite a
>long time ago.  Shrink-wrapping at least means that omitting the frame
>and omitting the frame pointer are different things, and it seems
>fair that -fomit-frame-pointer has followed the natural meaning.
>
>> You could also interpret -fno-omit-frame-pointer as obviously forcing
>a
>> frame as otherwise there's nothing to omit...
>
>But applying that kind of interpretation to something like
>-maccumulate-outgoing-args would make inlining all calls within a
>function invalid, since there'd no longer be arguments to accumulate.
>
>I think this kind of disagreement just emphasises that if we really
>need a "always emit a prologue at the very start, an epilogue at the
>very end, and always use a frame pointer" option, we should add it
>and document exactly what the guarantees are.  I don't think
>-fno-omit-frame-pointer should be it, since as the replies earlier in
>the thread said, the natural meaning of that option has its uses too.

OK, but then both -f[no-]omit-frame-pointer do not have clearly defined semantics and thus shouldn't be exposed to the user?

Richard.

>Thanks,
>Richard
>
>>
>>>> It seems like the responses have been treating it more like
>>>> a combination of:
>>>>
>>>> -fno-shrink-wrapping
>>>> -fno-omit-frame-pointer
>>>> the equivalent of the old textual prologues and epilogues
>>>>
>>>> but the positive option -fomit-frame-pointer doesn't have any
>effect
>>>> on the last two.
>>>
>>>er, you know what I mean :-)  It doesn't have any effect on
>>>-fshrink-wrapping or the textual-style prologues and epilogues.
>>
>> True.  But I think people do not appreciate new options too much if
>> existing ones worked in the past...
>>
>> Richard.
H.J. Lu Aug. 9, 2017, 11:31 a.m. UTC | #9
On Wed, Aug 9, 2017 at 4:22 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On August 9, 2017 9:53:05 AM GMT+02:00, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>>Richard Biener <richard.guenther@gmail.com> writes:
>>> On August 8, 2017 7:36:35 PM GMT+02:00, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>>Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On August 8, 2017 6:38:30 PM GMT+02:00, "H.J. Lu"
>>>><hjl.tools@gmail.com> wrote:
>>>>>>>On Mon, Aug 7, 2017 at 1:05 PM, Richard Sandiford
>>>>>>><richard.sandiford@linaro.org> wrote:
>>>>>>>> Arjan van de Ven <arjan@linux.intel.com> writes:
>>>>>>>>> On 8/7/2017 8:43 AM, Jakub Jelinek wrote:
>>>>>>>>>> On Mon, Aug 07, 2017 at 08:39:24AM -0700, H.J. Lu wrote:
>>>>>>>>>>> When Linux/x86-64 kernel is compiled with
>>>>-fno-omit-frame-pointer.
>>>>>>>>>>> this optimization removes more than 730
>>>>>>>>>>>
>>>>>>>>>>> pushq %rbp
>>>>>>>>>>> movq %rsp, %rbp
>>>>>>>>>>> popq %rbp
>>>>>>>>>>
>>>>>>>>>> If you don't want the frame pointer, why are you compiling
>>with
>>>>>>>>>> -fno-omit-frame-pointer?  Are you going to add
>>>>>>>>>> -fforce-no-omit-frame-pointer or something similar so that
>>>>people
>>>>>>>can
>>>>>>>>>> actually get what they are asking for?  This doesn't really
>>make
>>>>>>>sense.
>>>>>>>>>> It is perfectly fine to omit frame pointer by default, when it
>>>>>>>isn't
>>>>>>>>>> required for something, but if the user asks for it, we
>>>>shouldn't
>>>>>>>ignore his
>>>>>>>>>> request.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> wanting a framepointer is very nice and desired...  ... but if
>>>>the
>>>>>>>>> optimizer/ins scheduler moves instructions outside of the
>>frame'd
>>>>>>>>> portion, (it does it for cases like below as well), the value
>>is
>>>>>>>>> already negative for these functions that don't have stack use.
>>>>>>>>>
>>>>>>>>> <MPIDU_Sched_are_pending@@Base>:
>>>>>>>>> mov    all_schedules@@Base-0x38460,%rax
>>>>>>>>> push   %rbp
>>>>>>>>> mov    %rsp,%rbp
>>>>>>>>> pop    %rbp
>>>>>>>>> cmpq   $0x0,(%rax)
>>>>>>>>> setne  %al
>>>>>>>>> movzbl %al,%eax
>>>>>>>>> retq
>>>>>>>>
>>>>>>>> Yeah, and it could be even weirder for big single-block
>>functions.
>>>>>>>> I think GCC has been doing this kind of scheduling of prologue
>>and
>>>>>>>> epilogue instructions for a while, so there hasn*t really been a
>>>>>>>> guarantee which parts of the function will have a new FP and
>>which
>>>>>>>> will still have the old one.
>>>>>>>>
>>>>>>>> Also, with an arbitrarily-picked host compiler (GCC 6.3.1),
>>>>>>>shrink-wrapping
>>>>>>>> kicks in when the following is compiled with -O3
>>>>>>>-fno-omit-frame-pointer:
>>>>>>>>
>>>>>>>>     void f (int *);
>>>>>>>>     void
>>>>>>>>     g (int *x)
>>>>>>>>     {
>>>>>>>>       for (int i = 0; i < 1000; ++i)
>>>>>>>>         x[i] += 1;
>>>>>>>>       if (x[0])
>>>>>>>>         {
>>>>>>>>           int temp;
>>>>>>>>           f (&temp);
>>>>>>>>         }
>>>>>>>>     }
>>>>>>>>
>>>>>>>> so only the block with the call to f sets up FP.  The relatively
>>>>>>>> long-running loop runs with the caller's FP.
>>>>>>>>
>>>>>>>> I hope we can go for a target-independent position that what
>>HJ*s
>>>>>>>> patch does is OK...
>>>>>>>>
>>>>>>>
>>>>>>>In light of this,  I am resubmitting my patch.  I added 3 more
>>>>>>>testcases
>>>>>>>and also handle:
>>>>>>>
>>>>>>>typedef int v8si __attribute__ ((vector_size (32)));
>>>>>>>
>>>>>>>void
>>>>>>>foo (v8si *out_start, v8si *out_end, v8si *regions)
>>>>>>>{
>>>>>>>    v8si base = regions[3];
>>>>>>>    *out_start = base;
>>>>>>>    *out_end = base;
>>>>>>>}
>>>>>>>
>>>>>>>OK for trunk?
>>>>>>
>>>>>> The invoker specified -fno-omit-frame-pointer, why did you
>>eliminate
>>>>it?
>>>>>> I'd argue it's OK when neither -f nor -fno- is explicitly
>>specified
>>>>>> irrespective of the default in case we document the change but an
>>>>>> explicit -fno- is pretty clear.
>>>>>
>>>>> I don't buy that we're ignoring the user.  -fomit-frame-pointer
>>says
>>>>> that, when you're creating a frame, it's OK not to set up the frame
>>>>> pointer.  Forcing it off means that if you create a frame, you need
>>>>> to set up the frame pointer too.  But it doesn't say anything about
>>>>> whether the frame itself is needed.  I.e. it's
>>>>-fno-omit-frame*-pointer*
>>>>> rather than -fno-omit-frame.
>>>
>>> Isn't that a bit splitting hairs if you look at (past) history?
>>
>>I guess it would have been splitting hairs in the days when they
>>amounted to the same thing, i.e. when there was no behaviour that
>>would match "-fomit-frame" and when the prologue and epilogue were
>>glued to the start and end of the function.  But that was quite a
>>long time ago.  Shrink-wrapping at least means that omitting the frame
>>and omitting the frame pointer are different things, and it seems
>>fair that -fomit-frame-pointer has followed the natural meaning.
>>
>>> You could also interpret -fno-omit-frame-pointer as obviously forcing
>>a
>>> frame as otherwise there's nothing to omit...
>>
>>But applying that kind of interpretation to something like
>>-maccumulate-outgoing-args would make inlining all calls within a
>>function invalid, since there'd no longer be arguments to accumulate.
>>
>>I think this kind of disagreement just emphasises that if we really
>>need a "always emit a prologue at the very start, an epilogue at the
>>very end, and always use a frame pointer" option, we should add it
>>and document exactly what the guarantees are.  I don't think
>>-fno-omit-frame-pointer should be it, since as the replies earlier in
>>the thread said, the natural meaning of that option has its uses too.
>
> OK, but then both -f[no-]omit-frame-pointer do not have clearly defined semantics and thus shouldn't be exposed to the user?
>

-f[no-]omit-frame-pointer apply to cases where a new stack frame
is needed.  -fno-omit-frame-pointer allows you to unwind each
stack frame, not necessarily each function, via frame pointer.
 -fno-omit-frame-pointer may not create a new stack frame for
each function, similar to LTO or function inlining.  But you can still
unwind via frame pointer.  We never guarantee that we will create
a new stack frame for each function.  Some functions are inlined
completely.  Some just use the caller's stack frame.
Michael Matz Aug. 9, 2017, 11:59 a.m. UTC | #10
Hi,

On Wed, 9 Aug 2017, H.J. Lu wrote:

> > OK, but then both -f[no-]omit-frame-pointer do not have clearly defined semantics and thus shouldn't be exposed to the user?
> 
> -f[no-]omit-frame-pointer apply to cases where a new stack frame
> is needed.  -fno-omit-frame-pointer allows you to unwind each
> stack frame, not necessarily each function, via frame pointer.
>  -fno-omit-frame-pointer may not create a new stack frame for
> each function, similar to LTO or function inlining.  But you can still
> unwind via frame pointer.  We never guarantee that we will create
> a new stack frame for each function.  Some functions are inlined
> completely.  Some just use the caller's stack frame.

Maybe something along these lines should be added to the docu of 
fno-omit-frame-pointer then.  At least "Note that -fno-omit-frame-pointer 
doesn't force a frame for all functions if it isn't otherwise needed, and 
hence doesn't guarantee a frame pointer for all functions." .


Ciao,
Michael.
diff mbox

Patch

From d18008a8052973ab8582a1662f42669a9d318d0d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 6 Aug 2017 06:24:36 -0700
Subject: [PATCH] i386: Don't use frame pointer without stack access

When there is no stack access, there is no need to use frame pointer
even if -fno-omit-frame-pointer is used.

gcc/

	PR target/81736
	* config/i386/i386.c (ix86_finalize_stack_realign_flags): Renamed
	to ...
	(ix86_finalize_stack_frame_flags): This.  Also clear
	frame_pointer_needed if -fno-omit-frame-pointer is used without
	stack access.
	(ix86_expand_prologue): Replace ix86_finalize_stack_realign_flags
	with ix86_finalize_stack_frame_flags.
	(ix86_expand_epilogue): Likewise.
	(ix86_expand_split_stack_prologue): Likewise.

gcc/testsuite/

	PR target/81736
	* gcc.target/i386/pr81736-1.c: New test.
	* gcc.target/i386/pr81736-2.c: Likewise.
	* gcc.target/i386/pr81736-3.c: Likewise.
	* gcc.target/i386/pr81736-4.c: Likewise.
	* gcc.target/i386/pr81736-5.c: Likewise.
	* gcc.target/i386/pr81736-6.c: Likewise.
	* gcc.target/i386/pr81736-7.c: Likewise.
---
 gcc/config/i386/i386.c                    | 23 ++++++++++++-----------
 gcc/testsuite/gcc.target/i386/pr81736-1.c | 13 +++++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-2.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-3.c | 11 +++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-4.c | 11 +++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-5.c | 20 ++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-6.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr81736-7.c | 13 +++++++++++++
 8 files changed, 110 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81736-7.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index dfef996e36c..ed51595298d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14116,10 +14116,11 @@  output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
-/* Finalize stack_realign_needed flag, which will guide prologue/epilogue
-   to be generated in correct form.  */
+/* Finalize stack_realign_needed and frame_pointer_needed flags, which
+   will guide prologue/epilogue to be generated in correct form.  */
+
 static void
-ix86_finalize_stack_realign_flags (void)
+ix86_finalize_stack_frame_flags (void)
 {
   /* Check if stack realign is really needed after reload, and
      stores result in cfun */
@@ -14142,13 +14143,13 @@  ix86_finalize_stack_realign_flags (void)
     }
 
   /* If the only reason for frame_pointer_needed is that we conservatively
-     assumed stack realignment might be needed, but in the end nothing that
-     needed the stack alignment had been spilled, clear frame_pointer_needed
-     and say we don't need stack realignment.  */
-  if (stack_realign
+     assumed stack realignment might be needed or -fno-omit-frame-pointer
+     is used, but in the end nothing that needed the stack alignment had
+     been spilled nor stack access, clear frame_pointer_needed and say we
+     don't need stack realignment.  */
+  if ((stack_realign || !flag_omit_frame_pointer)
       && frame_pointer_needed
       && crtl->is_leaf
-      && flag_omit_frame_pointer
       && crtl->sp_is_unchanging
       && !ix86_current_function_calls_tls_descriptor
       && !crtl->accesses_prior_frames
@@ -14339,7 +14340,7 @@  ix86_expand_prologue (void)
   if (ix86_function_naked (current_function_decl))
     return;
 
-  ix86_finalize_stack_realign_flags ();
+  ix86_finalize_stack_frame_flags ();
 
   /* DRAP should not coexist with stack_realign_fp */
   gcc_assert (!(crtl->drap_reg && stack_realign_fp));
@@ -15203,7 +15204,7 @@  ix86_expand_epilogue (int style)
       return;
     }
 
-  ix86_finalize_stack_realign_flags ();
+  ix86_finalize_stack_frame_flags ();
   frame = m->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
@@ -15738,7 +15739,7 @@  ix86_expand_split_stack_prologue (void)
 
   gcc_assert (flag_split_stack && reload_completed);
 
-  ix86_finalize_stack_realign_flags ();
+  ix86_finalize_stack_frame_flags ();
   frame = cfun->machine->frame;
   allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-1.c b/gcc/testsuite/gcc.target/i386/pr81736-1.c
new file mode 100644
index 00000000000..92c7bc97a0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-1.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+extern int i;
+
+int
+foo (void)
+{
+  return i;
+}
+
+/* No need to use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-2.c b/gcc/testsuite/gcc.target/i386/pr81736-2.c
new file mode 100644
index 00000000000..a3720879937
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-2.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+int
+#ifndef __x86_64__
+__attribute__((regparm(3)))
+#endif
+foo (int i)
+{
+  return i;
+}
+
+/* No need to use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-3.c b/gcc/testsuite/gcc.target/i386/pr81736-3.c
new file mode 100644
index 00000000000..c3bde7dd933
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-3.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+void
+foo (void)
+{
+  asm ("# " : : : "ebx");
+}
+
+/* Need to use a frame pointer.  */
+/* { dg-final { scan-assembler "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-4.c b/gcc/testsuite/gcc.target/i386/pr81736-4.c
new file mode 100644
index 00000000000..25f50016a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-4.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+int
+foo (int i1, int i2, int i3, int i4, int i5, int i6, int i7)
+{
+  return i7;
+}
+
+/* Need to use a frame pointer.  */
+/* { dg-final { scan-assembler "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-5.c b/gcc/testsuite/gcc.target/i386/pr81736-5.c
new file mode 100644
index 00000000000..e1602cf25ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-5.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer -mavx" } */
+
+typedef int v8si __attribute__ ((vector_size (32)));
+
+void
+#ifndef __x86_64__
+__attribute__((regparm(3)))
+#endif
+foo (v8si *out_start, v8si *out_end, v8si *regions)
+{
+  v8si base = regions[3];
+  *out_start = base;
+  *out_end = base;
+}
+
+/* No need to use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
+/* Verify no dynamic realignment is performed.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-6.c b/gcc/testsuite/gcc.target/i386/pr81736-6.c
new file mode 100644
index 00000000000..6198574c8cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-6.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+struct foo
+{
+  int head;
+} a;
+
+int
+bar (void)
+{
+  return a.head != 0;
+}
+
+/* No need to use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr81736-7.c b/gcc/testsuite/gcc.target/i386/pr81736-7.c
new file mode 100644
index 00000000000..f947886e642
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81736-7.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer" } */
+
+extern int foo (void);
+
+int
+bar (void)
+{
+  return foo ();
+}
+
+/* No need to use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
-- 
2.13.4