Patchwork [google] remove redundant push {lr} for -mthumb (issue4441050)

login
register
mail settings
Submitter Carrot Wei
Date April 19, 2011, 9:41 a.m.
Message ID <20110419094104.B5D2A20562@guozhiwei.sha.corp.google.com>
Download mbox | patch
Permalink /patch/91943/
State New
Headers show

Comments

Carrot Wei - April 19, 2011, 9:41 a.m.
Reload pass tries to determine the stack frame, so it needs to check the
push/pop lr optimization opportunity. One of the criteria is if there is any
far jump inside the function. Unfortunately at this time gcc can't decide each
instruction's length and basic block layout, so it can't know the offset of
a jump. To be conservative it assumes every jump is a far jump. So any jump
in a function will prevent this push/pop lr optimization.

To enable the push/pop lr optimization in reload pass, I compute the possible
maximum length of the function body. If the length is not large enough, far
jump is not necessary, so we can safely do push/pop lr optimization.

Tested on arm qemu with options -march=armv5te -mthumb, without regression.

This patch is for google/main.

2011-04-19  Guozhi Wei  <carrot@google.com>

	Google ref 40255.
	* gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
	(estimate_function_length): New function.
	(thumb_far_jump_used_p): No far jump is needed in short function.


--
This patch is available for review at http://codereview.appspot.com/4441050
Richard Guenther - April 19, 2011, 9:57 a.m.
On Tue, Apr 19, 2011 at 11:41 AM, Guozhi Wei <carrot@google.com> wrote:
> Reload pass tries to determine the stack frame, so it needs to check the
> push/pop lr optimization opportunity. One of the criteria is if there is any
> far jump inside the function. Unfortunately at this time gcc can't decide each
> instruction's length and basic block layout, so it can't know the offset of
> a jump. To be conservative it assumes every jump is a far jump. So any jump
> in a function will prevent this push/pop lr optimization.
>
> To enable the push/pop lr optimization in reload pass, I compute the possible
> maximum length of the function body. If the length is not large enough, far
> jump is not necessary, so we can safely do push/pop lr optimization.

What about hot/cold partitioning?  That might cause jumps to different
sections.

Richard.

> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
>
> This patch is for google/main.
>
> 2011-04-19  Guozhi Wei  <carrot@google.com>
>
>        Google ref 40255.
>        * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>        (estimate_function_length): New function.
>        (thumb_far_jump_used_p): No far jump is needed in short function.
>
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c        (revision 172689)
> +++ gcc/config/arm/arm.c        (working copy)
> @@ -592,7 +592,9 @@ static const struct default_options arm_
>   arm_preferred_rename_class
>
>  struct gcc_target targetm = TARGET_INITIALIZER;
> -
> +
> +#define SHORTEST_FAR_JUMP_LENGTH 2040
> +
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
>  static char *         minipool_startobj;
> @@ -20298,6 +20300,17 @@ thumb_shiftable_const (unsigned HOST_WID
>   return 0;
>  }
>
> +/* Computes the maximum possible function length. */
> +static int
> +estimate_function_length (void)
> +{
> +  rtx insn;
> +  int length = 0;
> +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +    length += get_attr_length(insn);
> +  return length;
> +}
> +
>  /* Returns nonzero if the current function contains,
>    or might contain a far jump.  */
>  static int
> @@ -20316,6 +20329,16 @@ thumb_far_jump_used_p (void)
>   if (cfun->machine->far_jump_used)
>     return 1;
>
> +  /* In reload pass we haven't got the exact jump instruction length,
> +     but we can get a reasonable estimation based on the maximum
> +     possible function length. */
> +  if (!reload_completed)
> +    {
> +      int function_length = estimate_function_length();
> +      if (function_length < SHORTEST_FAR_JUMP_LENGTH)
> +       return 0;
> +    }
> +
>   /* If this function is not being called from the prologue/epilogue
>      generation code then it must be being called from the
>      INITIAL_ELIMINATION_OFFSET macro.  */
>
> --
> This patch is available for review at http://codereview.appspot.com/4441050
>
Carrot Wei - April 19, 2011, 10:12 a.m.
On Tue, Apr 19, 2011 at 5:57 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Apr 19, 2011 at 11:41 AM, Guozhi Wei <carrot@google.com> wrote:
>> Reload pass tries to determine the stack frame, so it needs to check the
>> push/pop lr optimization opportunity. One of the criteria is if there is any
>> far jump inside the function. Unfortunately at this time gcc can't decide each
>> instruction's length and basic block layout, so it can't know the offset of
>> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> in a function will prevent this push/pop lr optimization.
>>
>> To enable the push/pop lr optimization in reload pass, I compute the possible
>> maximum length of the function body. If the length is not large enough, far
>> jump is not necessary, so we can safely do push/pop lr optimization.
>
> What about hot/cold partitioning?  That might cause jumps to different
> sections.
>
> Richard.
>

The hot/cold partitioning is disabled in arm backend.
http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00671.html.

thanks
Carrot
Richard Earnshaw - April 19, 2011, 12:55 p.m.
On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
> Reload pass tries to determine the stack frame, so it needs to check the
> push/pop lr optimization opportunity. One of the criteria is if there is any
> far jump inside the function. Unfortunately at this time gcc can't decide each
> instruction's length and basic block layout, so it can't know the offset of
> a jump. To be conservative it assumes every jump is a far jump. So any jump
> in a function will prevent this push/pop lr optimization.
> 
> To enable the push/pop lr optimization in reload pass, I compute the possible
> maximum length of the function body. If the length is not large enough, far
> jump is not necessary, so we can safely do push/pop lr optimization.
> 
> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
> 
> This patch is for google/main.
> 
> 2011-04-19  Guozhi Wei  <carrot@google.com>
> 
> 	Google ref 40255.
> 	* gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
> 	(estimate_function_length): New function.
> 	(thumb_far_jump_used_p): No far jump is needed in short function.
> 

Setting aside for the moment Richi's issue with hot/cold sections, this
isn't safe.  Firstly get_attr_length() doesn't return the worst case
length; and secondly, it doesn't take into account the size of reload
insns that are still on the reloads stack -- these are only emitted
right at the end of the reload pass.  Both of these would need to be
addressed before this can be safely done.

It's worth noting here that in the dim and distant past we used to try
to estimate the size of the function and eliminate redundant saves of
R14, but the code had to be removed because it was too fragile; but it
looks like some vestiges of the code are still in the compiler.

A slightly less optimistic approach, but one that is much safer is to
scan the function after reload has completed and see if we can avoid
having to push LR.  We can do this if:

- The function makes no calls
- The function saves nothing on the stack other than r14
- It's small enough (by this point we can use get_attr_length)
- R14 is only modified by internal jump instructions

There's already some code to try to do this in the ARM back-end (look
for lr_save_eliminated), but it's probably not doing its job properly
because it tries to cache the result early on (it's costly to work this
out) and at the time its first called we cannot assert that the register
won't be live.

R.
Carrot Wei - April 20, 2011, 8:26 a.m.
On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
>> Reload pass tries to determine the stack frame, so it needs to check the
>> push/pop lr optimization opportunity. One of the criteria is if there is any
>> far jump inside the function. Unfortunately at this time gcc can't decide each
>> instruction's length and basic block layout, so it can't know the offset of
>> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> in a function will prevent this push/pop lr optimization.
>>
>> To enable the push/pop lr optimization in reload pass, I compute the possible
>> maximum length of the function body. If the length is not large enough, far
>> jump is not necessary, so we can safely do push/pop lr optimization.
>>
>> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
>>
>> This patch is for google/main.
>>
>> 2011-04-19  Guozhi Wei  <carrot@google.com>
>>
>>       Google ref 40255.
>>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>>       (estimate_function_length): New function.
>>       (thumb_far_jump_used_p): No far jump is needed in short function.
>>
>
> Setting aside for the moment Richi's issue with hot/cold sections, this
> isn't safe.  Firstly get_attr_length() doesn't return the worst case
> length; and secondly, it doesn't take into account the size of reload
> insns that are still on the reloads stack -- these are only emitted
> right at the end of the reload pass.  Both of these would need to be
> addressed before this can be safely done.
>
> It's worth noting here that in the dim and distant past we used to try
> to estimate the size of the function and eliminate redundant saves of
> R14, but the code had to be removed because it was too fragile; but it
> looks like some vestiges of the code are still in the compiler.
>
> A slightly less optimistic approach, but one that is much safer is to
> scan the function after reload has completed and see if we can avoid
> having to push LR.  We can do this if:
>
I guess "less optimistic" is relative to the ideal optimization
situation, I believe it is still much better than current result. Do
you think if arm_reorg() is appropriate place to do this?

thanks
Carrot
Richard Earnshaw - April 20, 2011, 8:48 a.m.
On Wed, 2011-04-20 at 16:26 +0800, Carrot Wei wrote:
> On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> >
> > On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
> >> Reload pass tries to determine the stack frame, so it needs to check the
> >> push/pop lr optimization opportunity. One of the criteria is if there is any
> >> far jump inside the function. Unfortunately at this time gcc can't decide each
> >> instruction's length and basic block layout, so it can't know the offset of
> >> a jump. To be conservative it assumes every jump is a far jump. So any jump
> >> in a function will prevent this push/pop lr optimization.
> >>
> >> To enable the push/pop lr optimization in reload pass, I compute the possible
> >> maximum length of the function body. If the length is not large enough, far
> >> jump is not necessary, so we can safely do push/pop lr optimization.
> >>
> >> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
> >>
> >> This patch is for google/main.
> >>
> >> 2011-04-19  Guozhi Wei  <carrot@google.com>
> >>
> >>       Google ref 40255.
> >>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
> >>       (estimate_function_length): New function.
> >>       (thumb_far_jump_used_p): No far jump is needed in short function.
> >>
> >
> > Setting aside for the moment Richi's issue with hot/cold sections, this
> > isn't safe.  Firstly get_attr_length() doesn't return the worst case
> > length; and secondly, it doesn't take into account the size of reload
> > insns that are still on the reloads stack -- these are only emitted
> > right at the end of the reload pass.  Both of these would need to be
> > addressed before this can be safely done.
> >
> > It's worth noting here that in the dim and distant past we used to try
> > to estimate the size of the function and eliminate redundant saves of
> > R14, but the code had to be removed because it was too fragile; but it
> > looks like some vestiges of the code are still in the compiler.
> >
> > A slightly less optimistic approach, but one that is much safer is to
> > scan the function after reload has completed and see if we can avoid
> > having to push LR.  We can do this if:
> >
> I guess "less optimistic" is relative to the ideal optimization
> situation, I believe it is still much better than current result. Do
> you think if arm_reorg() is appropriate place to do this?
> 

Making the decision in a single pass would certainly be the best
approach; and arm_reorg is certainly going to come after all other major
code re-arrangements.  Indeed, you should probably do this after the
minipool placement so that you can be sure that these don't bulk up the
body of the function too much.

As you are doing the elimination late on in the compilation you can do a
better job of estimation by calling shorten_branches() to work out the
precise length of each insn.  Then you can simply scan over the insns to
work out if there is a branch that still needs r14.

R.
Carrot Wei - April 20, 2011, 9:16 a.m.
I will try this method for trunk later.

thanks
Carrot

On Wed, Apr 20, 2011 at 4:48 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Wed, 2011-04-20 at 16:26 +0800, Carrot Wei wrote:
>> On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> >
>> > On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
>> >> Reload pass tries to determine the stack frame, so it needs to check the
>> >> push/pop lr optimization opportunity. One of the criteria is if there is any
>> >> far jump inside the function. Unfortunately at this time gcc can't decide each
>> >> instruction's length and basic block layout, so it can't know the offset of
>> >> a jump. To be conservative it assumes every jump is a far jump. So any jump
>> >> in a function will prevent this push/pop lr optimization.
>> >>
>> >> To enable the push/pop lr optimization in reload pass, I compute the possible
>> >> maximum length of the function body. If the length is not large enough, far
>> >> jump is not necessary, so we can safely do push/pop lr optimization.
>> >>
>> >> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
>> >>
>> >> This patch is for google/main.
>> >>
>> >> 2011-04-19  Guozhi Wei  <carrot@google.com>
>> >>
>> >>       Google ref 40255.
>> >>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>> >>       (estimate_function_length): New function.
>> >>       (thumb_far_jump_used_p): No far jump is needed in short function.
>> >>
>> >
>> > Setting aside for the moment Richi's issue with hot/cold sections, this
>> > isn't safe.  Firstly get_attr_length() doesn't return the worst case
>> > length; and secondly, it doesn't take into account the size of reload
>> > insns that are still on the reloads stack -- these are only emitted
>> > right at the end of the reload pass.  Both of these would need to be
>> > addressed before this can be safely done.
>> >
>> > It's worth noting here that in the dim and distant past we used to try
>> > to estimate the size of the function and eliminate redundant saves of
>> > R14, but the code had to be removed because it was too fragile; but it
>> > looks like some vestiges of the code are still in the compiler.
>> >
>> > A slightly less optimistic approach, but one that is much safer is to
>> > scan the function after reload has completed and see if we can avoid
>> > having to push LR.  We can do this if:
>> >
>> I guess "less optimistic" is relative to the ideal optimization
>> situation, I believe it is still much better than current result. Do
>> you think if arm_reorg() is appropriate place to do this?
>>
>
> Making the decision in a single pass would certainly be the best
> approach; and arm_reorg is certainly going to come after all other major
> code re-arrangements.  Indeed, you should probably do this after the
> minipool placement so that you can be sure that these don't bulk up the
> body of the function too much.
>
> As you are doing the elimination late on in the compilation you can do a
> better job of estimation by calling shorten_branches() to work out the
> precise length of each insn.  Then you can simply scan over the insns to
> work out if there is a branch that still needs r14.
>
> R.
>
>
>
>

Patch

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 172689)
+++ gcc/config/arm/arm.c	(working copy)
@@ -592,7 +592,9 @@  static const struct default_options arm_
   arm_preferred_rename_class
 
 struct gcc_target targetm = TARGET_INITIALIZER;
-
+
+#define SHORTEST_FAR_JUMP_LENGTH 2040
+
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
 static char *         minipool_startobj;
@@ -20298,6 +20300,17 @@  thumb_shiftable_const (unsigned HOST_WID
   return 0;
 }
 
+/* Computes the maximum possible function length. */
+static int
+estimate_function_length (void)
+{
+  rtx insn;
+  int length = 0;
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    length += get_attr_length(insn);
+  return length;
+}
+
 /* Returns nonzero if the current function contains,
    or might contain a far jump.  */
 static int
@@ -20316,6 +20329,16 @@  thumb_far_jump_used_p (void)
   if (cfun->machine->far_jump_used)
     return 1;
 
+  /* In reload pass we haven't got the exact jump instruction length,
+     but we can get a reasonable estimation based on the maximum
+     possible function length. */
+  if (!reload_completed)
+    {
+      int function_length = estimate_function_length();
+      if (function_length < SHORTEST_FAR_JUMP_LENGTH)
+	return 0;
+    }
+
   /* If this function is not being called from the prologue/epilogue
      generation code then it must be being called from the
      INITIAL_ELIMINATION_OFFSET macro.  */