diff mbox

[1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc

Message ID 4b44888ff2f1237b364ccd0a566a4b739db0a032.1470015604.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Aug. 1, 2016, 1:42 a.m. UTC
This patch adds a new command-line flag "-fshrink-wrap-separate", a status
flag "shrink_wrapped_separate", hooks for abstracting the target components,
and documentation for all those.

2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* common.opt (-fshrink-wrap-separate): New flag.
	* doc/invoke.texi: Document it.
	* doc/tm.texi.in (Shrink-wrapping separate components): New subsection.
	* doc/tm.texi: Regenerate.
	* emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate.
	* target.def (shrink_wrap): New hook vector.
	(get_separate_components, components_for_bb, disqualify_components,
	emit_prologue_components, emit_epilogue_components,
	set_handled_components): New hooks.
---
 gcc/common.opt      |  4 ++++
 gcc/doc/invoke.texi | 11 ++++++++++-
 gcc/doc/tm.texi     | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/doc/tm.texi.in  | 29 +++++++++++++++++++++++++++
 gcc/emit-rtl.h      |  4 ++++
 gcc/target.def      | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 158 insertions(+), 1 deletion(-)

Comments

Bernd Schmidt Aug. 29, 2016, 9:31 a.m. UTC | #1
On 08/01/2016 03:42 AM, Segher Boessenkool wrote:
> +@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
> +Emit prologue insns for the components indicated by the parameter.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
> +Emit epilogue insns for the components indicated by the parameter.
> +@end deftypefn

How do these actually know where to save/restore registers? The frame 
pointer may have been eliminated, and SP isn't necessarily constant 
during the function. Seems like you'd have to calculate CFA reg/offset 
much like dwarf2out does and pass it to this hook.


Bernd
Segher Boessenkool Aug. 29, 2016, 2:30 p.m. UTC | #2
On Mon, Aug 29, 2016 at 11:31:51AM +0200, Bernd Schmidt wrote:
> On 08/01/2016 03:42 AM, Segher Boessenkool wrote:
> >+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS 
> >(sbitmap)
> >+Emit prologue insns for the components indicated by the parameter.
> >+@end deftypefn
> >+
> >+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS 
> >(sbitmap)
> >+Emit epilogue insns for the components indicated by the parameter.
> >+@end deftypefn
> 
> How do these actually know where to save/restore registers? The frame 
> pointer may have been eliminated, and SP isn't necessarily constant 
> during the function. Seems like you'd have to calculate CFA reg/offset 
> much like dwarf2out does and pass it to this hook.

There are many other reasons why separate shrink-wrapping can not be
done for a certain function, too; some target-specific.

The generic code (patch 8/9) does

+  /* We don't handle "strange" functions.  */
+  if (cfun->calls_alloca
+      || cfun->calls_setjmp
+      || cfun->can_throw_non_call_exceptions
+      || crtl->calls_eh_return
+      || crtl->has_nonlocal_goto
+      || crtl->saves_all_registers)
+    return;
+
+  /* Ask the target what components there are.  If it returns NULL, don't
+     do anything.  */
+  sbitmap components = targetm.shrink_wrap.get_separate_components ();

and the rs6000 version of TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
(path 9/9) then does

+  rs6000_stack_t *info = rs6000_stack_info ();
+
+  if (!(info->savres_strategy & SAVE_INLINE_GPRS)
+      || !(info->savres_strategy & REST_INLINE_GPRS)
+      || WORLD_SAVE_P (info))
+    return NULL;

and also checks for each component if it can access the save slot with
just one instruction.  It can handle both the stack pointer (r1) and
the hard frame pointer (r31); it uses the same logic as the "ordinary"
prologue/epilogue would.


Segher
Jeff Law Sept. 8, 2016, 5:20 p.m. UTC | #3
On 08/29/2016 03:31 AM, Bernd Schmidt wrote:
> On 08/01/2016 03:42 AM, Segher Boessenkool wrote:
>> +@deftypefn {Target Hook} void
>> TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
>> +Emit prologue insns for the components indicated by the parameter.
>> +@end deftypefn
>> +
>> +@deftypefn {Target Hook} void
>> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
>> +Emit epilogue insns for the components indicated by the parameter.
>> +@end deftypefn
>
> How do these actually know where to save/restore registers? The frame
> pointer may have been eliminated, and SP isn't necessarily constant
> during the function. Seems like you'd have to calculate CFA reg/offset
> much like dwarf2out does and pass it to this hook.
So I think the confusion here is these hooks are independent of 
placement. ie, the target independent code does something like:

FOR_EACH_BB
   Build the component bitmap using the incoming edge components
   Emit the prologue components at the start of the block
   Emit the epilogue components at the end of the block


The components handled by a particular block start are set/cleared by 
the other hooks.

jeff
Jeff Law Sept. 8, 2016, 5:37 p.m. UTC | #4
On 07/31/2016 07:42 PM, Segher Boessenkool wrote:
> This patch adds a new command-line flag "-fshrink-wrap-separate", a status
> flag "shrink_wrapped_separate", hooks for abstracting the target components,
> and documentation for all those.
>
> 2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>
>
> 	* common.opt (-fshrink-wrap-separate): New flag.
> 	* doc/invoke.texi: Document it.
> 	* doc/tm.texi.in (Shrink-wrapping separate components): New subsection.
> 	* doc/tm.texi: Regenerate.
> 	* emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate.
> 	* target.def (shrink_wrap): New hook vector.
> 	(get_separate_components, components_for_bb, disqualify_components,
> 	emit_prologue_components, emit_epilogue_components,
> 	set_handled_components): New hooks.
> ---
>  gcc/common.opt      |  4 ++++
>  gcc/doc/invoke.texi | 11 ++++++++++-
>  gcc/doc/tm.texi     | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/doc/tm.texi.in  | 29 +++++++++++++++++++++++++++
>  gcc/emit-rtl.h      |  4 ++++
>  gcc/target.def      | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 158 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 8a292ed..97d305f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 9edb006..5a5c5cab 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -4852,6 +4853,59 @@ This hook should add additional registers that are computed by the prologue to t
>  True if a function's return statements should be checked for matching the function's return type.  This includes checking for falling off the end of a non-void function.  Return false if no such check should be made.
>  @end deftypefn
>
> +@node Shrink-wrapping separate components
> +@subsection Shrink-wrapping separate components
> +@cindex shrink-wrapping separate components
> +
> +The prologue does a lot of separate things: save callee-saved registers,
> +do whatever needs to be done to be able to call things (save the return
> +address, align the stack, whatever; different for each target), set up a
> +stack frame, do whatever needs to be done for the static chain (if anything),
> +set up registers for PIC, etc.
The prologue may perform a variety of target dependent tasks such as 
saving callee saved registers, saving the return address, aligning the 
stack, create a local stack frame, initialize the PIC register, etc.

On some targets some of these tasks may be independent of others and 
thus may be shrink-wrapped separately.  These independent tasks are 
referred to as components and are handled generically by the target 
independent parts of GCC.

Each component has a slot in a sbitmap that is generated and maintained 
for each basic block.



> +@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS (void)
> +This hook should return an @code{sbitmap} with the bits set for those
> +components that can be separately shrink-wrapped in the current function.
> +Return @code{NULL} if the current function should not get any separate
> +shrink-wrapping.
> +Don't define this hook if it would always return @code{NULL}.
> +If it is defined, the other hooks in this group have to be defined as well.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB (basic_block)
> +This hook should return an @code{sbitmap} with the bits set for those
> +components where either the prologue component has to be executed before
> +the @code{basic_block}, or the epilogue component after it, or both.
> +@end deftypefn
Who is responsible for allocating and releasing the sbitmaps?


I don't have major concerns with this patch -- I'd like to see 
clarification done on the ownership of the sbitmaps (ie, who allocates 
and releases those objects).  I'd like to see if we can get a better 
introduction as well.

Jeff
Bernd Schmidt Sept. 9, 2016, 10:58 a.m. UTC | #5
On 09/08/2016 07:20 PM, Jeff Law wrote:
> On 08/29/2016 03:31 AM, Bernd Schmidt wrote:
>> How do these actually know where to save/restore registers? The frame
>> pointer may have been eliminated, and SP isn't necessarily constant
>> during the function. Seems like you'd have to calculate CFA reg/offset
>> much like dwarf2out does and pass it to this hook.
> So I think the confusion here is these hooks are independent of
> placement. ie, the target independent code does something like:
>
> FOR_EACH_BB
>   Build the component bitmap using the incoming edge components
>   Emit the prologue components at the start of the block
>   Emit the epilogue components at the end of the block
>
>
> The components handled by a particular block start are set/cleared by
> the other hooks.

Hmm? The problem is that you can't generally emit a save/restore 
independent of placement, because you may not know which offset to use 
from whichever base register. But these offsets aren't necessarily 
constant throughout the function. Segher explained that the algorithm 
deals with this by giving up in many cases, which of course limits the 
usefulness. It probably makes it unusable entirely on targets that want 
to use pushes for function args.


Bernd
Segher Boessenkool Sept. 9, 2016, 3:04 p.m. UTC | #6
On Fri, Sep 09, 2016 at 12:58:11PM +0200, Bernd Schmidt wrote:
> Hmm? The problem is that you can't generally emit a save/restore 
> independent of placement, because you may not know which offset to use 
> from whichever base register. But these offsets aren't necessarily 
> constant throughout the function. Segher explained that the algorithm 
> deals with this by giving up in many cases, which of course limits the 
> usefulness. It probably makes it unusable entirely on targets that want 
> to use pushes for function args.

It's not the generic algorithm that gives up; it's the target hook.
Specifically, at least for the PowerPC one I wrote, the
TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS hook gives up on register
components that need an offset from the base reg (stack or frame pointer)
that cannot be used in a single instruction (i.e. won't fit in 16 bits).

The generic code only does

+  /* We don't handle "strange" functions.  */
+  if (cfun->calls_alloca
+      || cfun->calls_setjmp
+      || cfun->can_throw_non_call_exceptions
+      || crtl->calls_eh_return
+      || crtl->has_nonlocal_goto
+      || crtl->saves_all_registers)
+    return;

so that does not give up in "many" cases.

Targets that push function args can handle things fine as far as I see?
Targets that normally use push insns in the prologue will just have to
not do that for the components that are separately wrapped.  Or they can
still use pushes to reserve that space, if that works better.


Segher
Segher Boessenkool Sept. 9, 2016, 3:33 p.m. UTC | #7
On Thu, Sep 08, 2016 at 11:20:41AM -0600, Jeff Law wrote:
> On 08/29/2016 03:31 AM, Bernd Schmidt wrote:
> >On 08/01/2016 03:42 AM, Segher Boessenkool wrote:
> >>+@deftypefn {Target Hook} void
> >>TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
> >>+Emit prologue insns for the components indicated by the parameter.
> >>+@end deftypefn
> >>+
> >>+@deftypefn {Target Hook} void
> >>TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
> >>+Emit epilogue insns for the components indicated by the parameter.
> >>+@end deftypefn
> >
> >How do these actually know where to save/restore registers? The frame
> >pointer may have been eliminated, and SP isn't necessarily constant
> >during the function. Seems like you'd have to calculate CFA reg/offset
> >much like dwarf2out does and pass it to this hook.
> So I think the confusion here is these hooks are independent of 
> placement. ie, the target independent code does something like:
> 
> FOR_EACH_BB
>   Build the component bitmap using the incoming edge components
>   Emit the prologue components at the start of the block
>   Emit the epilogue components at the end of the block

So you think those hooks need a BB parameter?  If there are ports that
need that, then sure.  PowerPC doesn't need it.


Segher
Segher Boessenkool Sept. 9, 2016, 3:43 p.m. UTC | #8
On Thu, Sep 08, 2016 at 11:37:17AM -0600, Jeff Law wrote:
> I don't have major concerns with this patch -- I'd like to see 
> clarification done on the ownership of the sbitmaps (ie, who allocates 
> and releases those objects).  I'd like to see if we can get a better 
> introduction as well.

I'll work a bit more on improving the internals documentation, okay.

Thanks,


Segher
Jeff Law Sept. 9, 2016, 3:44 p.m. UTC | #9
On 09/09/2016 04:58 AM, Bernd Schmidt wrote:
> On 09/08/2016 07:20 PM, Jeff Law wrote:
>> On 08/29/2016 03:31 AM, Bernd Schmidt wrote:
>>> How do these actually know where to save/restore registers? The frame
>>> pointer may have been eliminated, and SP isn't necessarily constant
>>> during the function. Seems like you'd have to calculate CFA reg/offset
>>> much like dwarf2out does and pass it to this hook.
>> So I think the confusion here is these hooks are independent of
>> placement. ie, the target independent code does something like:
>>
>> FOR_EACH_BB
>>   Build the component bitmap using the incoming edge components
>>   Emit the prologue components at the start of the block
>>   Emit the epilogue components at the end of the block
>>
>>
>> The components handled by a particular block start are set/cleared by
>> the other hooks.
>
> Hmm? The problem is that you can't generally emit a save/restore
> independent of placement, because you may not know which offset to use
> from whichever base register. But these offsets aren't necessarily
> constant throughout the function. Segher explained that the algorithm
> deals with this by giving up in many cases, which of course limits the
> usefulness. It probably makes it unusable entirely on targets that want
> to use pushes for function args.
On a target with ACCUMULATE_OUTGOING_ARGS the offsets are generally 
going to be constant.  For a target that's pushing args and not using a 
frame pointer, this isn't likely to work.  But that's OK as those 
targets wouldn't define the hooks or would test for those cases within 
the given hooks.  I can envision (but will certainly not implement) 
separate shrink wrapping on m68k with a frame pointer in Segher's framework.

Essentially the hooks push these decisions into the target machine, 
which is where they belong.  FUrthermore, the hooks can build a custom 
sequence for each insertion point.

That allows (as an example) the PPC LR save/restore sequence to use r0 
as an intermediate register for that case.  We could use those 
mechanisms to ensure there's a scratch register at the insertion point 
for address computations on targets that need them, etc.



Jeff
Jeff Law Sept. 9, 2016, 4:51 p.m. UTC | #10
On 09/09/2016 09:33 AM, Segher Boessenkool wrote:
> On Thu, Sep 08, 2016 at 11:20:41AM -0600, Jeff Law wrote:
>> On 08/29/2016 03:31 AM, Bernd Schmidt wrote:
>>> On 08/01/2016 03:42 AM, Segher Boessenkool wrote:
>>>> +@deftypefn {Target Hook} void
>>>> TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
>>>> +Emit prologue insns for the components indicated by the parameter.
>>>> +@end deftypefn
>>>> +
>>>> +@deftypefn {Target Hook} void
>>>> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
>>>> +Emit epilogue insns for the components indicated by the parameter.
>>>> +@end deftypefn
>>>
>>> How do these actually know where to save/restore registers? The frame
>>> pointer may have been eliminated, and SP isn't necessarily constant
>>> during the function. Seems like you'd have to calculate CFA reg/offset
>>> much like dwarf2out does and pass it to this hook.
>> So I think the confusion here is these hooks are independent of
>> placement. ie, the target independent code does something like:
>>
>> FOR_EACH_BB
>>   Build the component bitmap using the incoming edge components
>>   Emit the prologue components at the start of the block
>>   Emit the epilogue components at the end of the block
>
> So you think those hooks need a BB parameter?  If there are ports that
> need that, then sure.  PowerPC doesn't need it.
I wouldn't add it now.  If we find a port that needs it, we can do so at 
that time.  Maybe the port wants to scan the block for a scratch 
register or somesuch.  But let's wait until we actually see a need.

I was mostly trying to highlight the high level structure and that each 
block can have a prologue and/or epilogue associated with it and that 
they are specialized for each block.

jeff
Jeff Law Sept. 9, 2016, 6:28 p.m. UTC | #11
On 09/09/2016 09:04 AM, Segher Boessenkool wrote:
> On Fri, Sep 09, 2016 at 12:58:11PM +0200, Bernd Schmidt wrote:
>> Hmm? The problem is that you can't generally emit a save/restore
>> independent of placement, because you may not know which offset to use
>> from whichever base register. But these offsets aren't necessarily
>> constant throughout the function. Segher explained that the algorithm
>> deals with this by giving up in many cases, which of course limits the
>> usefulness. It probably makes it unusable entirely on targets that want
>> to use pushes for function args.
>
> It's not the generic algorithm that gives up; it's the target hook.
> Specifically, at least for the PowerPC one I wrote, the
> TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS hook gives up on register
> components that need an offset from the base reg (stack or frame pointer)
> that cannot be used in a single instruction (i.e. won't fit in 16 bits).
>
> The generic code only does
>
> +  /* We don't handle "strange" functions.  */
> +  if (cfun->calls_alloca
> +      || cfun->calls_setjmp
> +      || cfun->can_throw_non_call_exceptions
> +      || crtl->calls_eh_return
> +      || crtl->has_nonlocal_goto
> +      || crtl->saves_all_registers)
> +    return;
>
> so that does not give up in "many" cases.
Doesn't seem like a lot to me either.


>
> Targets that push function args can handle things fine as far as I see?
> Targets that normally use push insns in the prologue will just have to
> not do that for the components that are separately wrapped.  Or they can
> still use pushes to reserve that space, if that works better.
To me it's more about the fact that the offset to the slot where the 
register should be saved varies (unless you have a frame pointer) and I 
don't think there's enough information in any of the hook arguments to 
allow derivation of that offset.

But I don't consider that a flaw that should block this feature.  If 
someday we want to implement on such a target, we'll have to figure out 
how to compute that offset at an arbitrary location and pass along the 
needed information to the hooks.

jeff
Segher Boessenkool Sept. 9, 2016, 8:33 p.m. UTC | #12
On Fri, Sep 09, 2016 at 12:28:12PM -0600, Jeff Law wrote:
> >The generic code only does
> >
> >+  /* We don't handle "strange" functions.  */
> >+  if (cfun->calls_alloca
> >+      || cfun->calls_setjmp
> >+      || cfun->can_throw_non_call_exceptions
> >+      || crtl->calls_eh_return
> >+      || crtl->has_nonlocal_goto
> >+      || crtl->saves_all_registers)
> >+    return;
> >
> >so that does not give up in "many" cases.
> Doesn't seem like a lot to me either.

A few of those could be handled, perhaps with some extra hooks, but it
didn't look useful to me so far.

> >Targets that push function args can handle things fine as far as I see?
> >Targets that normally use push insns in the prologue will just have to
> >not do that for the components that are separately wrapped.  Or they can
> >still use pushes to reserve that space, if that works better.
> To me it's more about the fact that the offset to the slot where the 
> register should be saved varies (unless you have a frame pointer) and I 
> don't think there's enough information in any of the hook arguments to 
> allow derivation of that offset.

I think knowing in front of what BB to insert the prologue (or after what
BB, the epilogue) is all info we need?

And if even that is not good enough for any target then that target can
elect not to do separate shrink-wrapping at all ;-)

> But I don't consider that a flaw that should block this feature.  If 
> someday we want to implement on such a target, we'll have to figure out 
> how to compute that offset at an arbitrary location and pass along the 
> needed information to the hooks.

We probably need to be able to calculate that offset at the edges of any
BB for other reasons, already.


Segher
Jeff Law Sept. 12, 2016, 4:36 p.m. UTC | #13
On 09/09/2016 02:33 PM, Segher Boessenkool wrote:
> On Fri, Sep 09, 2016 at 12:28:12PM -0600, Jeff Law wrote:
>>> The generic code only does
>>>
>>> +  /* We don't handle "strange" functions.  */
>>> +  if (cfun->calls_alloca
>>> +      || cfun->calls_setjmp
>>> +      || cfun->can_throw_non_call_exceptions
>>> +      || crtl->calls_eh_return
>>> +      || crtl->has_nonlocal_goto
>>> +      || crtl->saves_all_registers)
>>> +    return;
>>>
>>> so that does not give up in "many" cases.
>> Doesn't seem like a lot to me either.
>
> A few of those could be handled, perhaps with some extra hooks, but it
> didn't look useful to me so far.
Agreed.  I don't think this is worth spending time on.


>
>>> Targets that push function args can handle things fine as far as I see?
>>> Targets that normally use push insns in the prologue will just have to
>>> not do that for the components that are separately wrapped.  Or they can
>>> still use pushes to reserve that space, if that works better.
>> To me it's more about the fact that the offset to the slot where the
>> register should be saved varies (unless you have a frame pointer) and I
>> don't think there's enough information in any of the hook arguments to
>> allow derivation of that offset.
>
> I think knowing in front of what BB to insert the prologue (or after what
> BB, the epilogue) is all info we need?
Maybe.  I'd be worried about things like deferred pops and 
combine-stack-adjustments.  The former are probably OK as I suspect we 
cleaned things up at basic block boundaries.  The latter I've never 
really looked at.

>
> And if even that is not good enough for any target then that target can
> elect not to do separate shrink-wrapping at all ;-)
Exactly.

Jeff
diff mbox

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..97d305f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2168,6 +2168,10 @@  Common Report Var(flag_shrink_wrap) Optimization
 Emit function prologues only before parts of the function that need it,
 rather than at the top of the function.
 
+fshrink-wrap-separate
+Common Report Var(flag_shrink_wrap_separate) Init(1) Optimization
+Shrink-wrap parts of the prologue and epilogue separately.
+
 fsignaling-nans
 Common Report Var(flag_signaling_nans) Optimization SetByCombined
 Disable optimizations observable by IEEE signaling NaNs.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 22001f9..2ea1727 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -397,7 +397,8 @@  Objective-C and Objective-C++ Dialects}.
 -fschedule-insns -fschedule-insns2 -fsection-anchors @gol
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
--fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
+-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate @gol
+-fsignaling-nans @gol
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
@@ -6322,6 +6323,7 @@  compilation time.
 -fmove-loop-invariants @gol
 -freorder-blocks @gol
 -fshrink-wrap @gol
+-fshrink-wrap-separate @gol
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
@@ -7231,6 +7233,13 @@  Emit function prologues only before parts of the function that need it,
 rather than at the top of the function.  This flag is enabled by default at
 @option{-O} and higher.
 
+@item -fshrink-wrap-separate
+@opindex fshrink-wrap-separate
+Shrink-wrap separate parts of the prologue and epilogue separately, so that
+those parts are only executed when needed.
+This option is on by default, but has no effect unless @option{-fshrink-wrap}
+is also turned on.
+
 @item -fcaller-saves
 @opindex fcaller-saves
 Enable allocation of values to registers that are clobbered by
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9edb006..5a5c5cab 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2924,6 +2924,7 @@  This describes the stack layout and calling conventions.
 * Function Entry::
 * Profiling::
 * Tail Calls::
+* Shrink-wrapping separate components::
 * Stack Smashing Protection::
 * Miscellaneous Register Hooks::
 @end menu
@@ -4852,6 +4853,59 @@  This hook should add additional registers that are computed by the prologue to t
 True if a function's return statements should be checked for matching the function's return type.  This includes checking for falling off the end of a non-void function.  Return false if no such check should be made.
 @end deftypefn
 
+@node Shrink-wrapping separate components
+@subsection Shrink-wrapping separate components
+@cindex shrink-wrapping separate components
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.  Using the following hooks those prologue
+or epilogue components can be shrink-wrapped separately, so that the
+initialization (and possibly teardown) those components do is not done on
+execution paths where it is unnecessary.
+
+What exactly those components are is up to the target code; the generic
+code treats them abstractly.
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS (void)
+This hook should return an @code{sbitmap} with the bits set for those
+components that can be separately shrink-wrapped in the current function.
+Return @code{NULL} if the current function should not get any separate
+shrink-wrapping.
+Don't define this hook if it would always return @code{NULL}.
+If it is defined, the other hooks in this group have to be defined as well.
+@end deftypefn
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB (basic_block)
+This hook should return an @code{sbitmap} with the bits set for those
+components where either the prologue component has to be executed before
+the @code{basic_block}, or the epilogue component after it, or both.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS (sbitmap @var{components}, edge @var{e}, sbitmap @var{edge_components}, bool @var{is_prologue})
+This hook should clear the bits in the @var{components} bitmap for those
+components in @var{edge_components} that the target cannot handle on edge
+@var{e}, where @var{is_prologue} says if this is for a prologue or an
+epilogue instead.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS (sbitmap)
+Emit prologue insns for the components indicated by the parameter.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS (sbitmap)
+Emit epilogue insns for the components indicated by the parameter.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS (sbitmap)
+Mark the components in the parameter as handled, so that the
+@code{prologue} and @code{epilogue} named patterns know to ignore those
+components.  The target code should not hang on to the @code{sbitmap}, it
+will be deleted after this call.
+@end deftypefn
+
 @node Stack Smashing Protection
 @subsection Stack smashing protection
 @cindex stack smashing protection
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index a72c3d8..aaa11e6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2530,6 +2530,7 @@  This describes the stack layout and calling conventions.
 * Function Entry::
 * Profiling::
 * Tail Calls::
+* Shrink-wrapping separate components::
 * Stack Smashing Protection::
 * Miscellaneous Register Hooks::
 @end menu
@@ -3789,6 +3790,34 @@  the function prologue.  Normally, the profiling code comes after.
 
 @hook TARGET_WARN_FUNC_RETURN
 
+@node Shrink-wrapping separate components
+@subsection Shrink-wrapping separate components
+@cindex shrink-wrapping separate components
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.  Using the following hooks those prologue
+or epilogue components can be shrink-wrapped separately, so that the
+initialization (and possibly teardown) those components do is not done on
+execution paths where it is unnecessary.
+
+What exactly those components are is up to the target code; the generic
+code treats them abstractly.
+
+@hook TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
+
+@hook TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB
+
+@hook TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS
+
+@hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS
+
+@hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS
+
+@hook TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS
+
 @node Stack Smashing Protection
 @subsection Stack smashing protection
 @cindex stack smashing protection
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 39dfce9..001f775 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -254,6 +254,10 @@  struct GTY(()) rtl_data {
   /* True if we performed shrink-wrapping for the current function.  */
   bool shrink_wrapped;
 
+  /* True if we performed shrink-wrapping for separate components for
+     the current function.  */
+  bool shrink_wrapped_separate;
+
   /* Nonzero if function being compiled doesn't modify the stack pointer
      (ignoring the prologue and epilogue).  This is only valid after
      pass_stack_ptr_mod has run.  */
diff --git a/gcc/target.def b/gcc/target.def
index 929d9ea..b951aec 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5773,6 +5773,63 @@  DEFHOOK
  bool, (tree),
  hook_bool_tree_true)
 
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SHRINK_WRAP_"
+HOOK_VECTOR (TARGET_SHRINK_WRAP_HOOKS, shrink_wrap)
+
+DEFHOOK
+(get_separate_components,
+ "This hook should return an @code{sbitmap} with the bits set for those\n\
+components that can be separately shrink-wrapped in the current function.\n\
+Return @code{NULL} if the current function should not get any separate\n\
+shrink-wrapping.\n\
+Don't define this hook if it would always return @code{NULL}.\n\
+If it is defined, the other hooks in this group have to be defined as well.",
+ sbitmap, (void),
+ NULL)
+
+DEFHOOK
+(components_for_bb,
+ "This hook should return an @code{sbitmap} with the bits set for those\n\
+components where either the prologue component has to be executed before\n\
+the @code{basic_block}, or the epilogue component after it, or both.",
+ sbitmap, (basic_block),
+ NULL)
+
+DEFHOOK
+(disqualify_components,
+ "This hook should clear the bits in the @var{components} bitmap for those\n\
+components in @var{edge_components} that the target cannot handle on edge\n\
+@var{e}, where @var{is_prologue} says if this is for a prologue or an\n\
+epilogue instead.",
+ void, (sbitmap components, edge e, sbitmap edge_components, bool is_prologue),
+ NULL)
+
+DEFHOOK
+(emit_prologue_components,
+ "Emit prologue insns for the components indicated by the parameter.",
+ void, (sbitmap),
+ NULL)
+
+DEFHOOK
+(emit_epilogue_components,
+ "Emit epilogue insns for the components indicated by the parameter.",
+ void, (sbitmap),
+ NULL)
+
+DEFHOOK
+(set_handled_components,
+ "Mark the components in the parameter as handled, so that the\n\
+@code{prologue} and @code{epilogue} named patterns know to ignore those\n\
+components.  The target code should not hang on to the @code{sbitmap}, it\n\
+will be deleted after this call.",
+ void, (sbitmap),
+ NULL)
+
+HOOK_VECTOR_END (shrink_wrap)
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_"
+
 /* Determine the type of unwind info to emit for debugging.  */
 DEFHOOK
 (debug_unwind_info,