diff mbox

[RFC] Remove PUSH_ARGS_REVERSED from the RTL expander.

Message ID 1395661485-28060-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh March 24, 2014, 11:44 a.m. UTC
Hi,

Consider this testcase:

  extern void foo (int a, int b, int c, int d);

  void
  bar (int b, int c, int d)
  {
      foo (3, b, c, d);
  }

For many ABI's we will have on entry to the function

  r0 = b
  r1 = c
  r2 = d

If we process arguments to the call to foo left-to-right
(PUSH_ARGS_REVERSED == 0) we assign temporaries, and then hard registers
in this order:

  t0 = 3
  t1 = r0
  t2 = r1
  t3 = r2

  r0 = t0
  r1 = t1
  r2 = t2
  r3 = t3

We can't eliminate any of these temporaries as their live ranges overlap.

However, If we process the arguments right-to-left (PUSH_ARGS_REVERSED == 1),
we get this order:

  t0 = 3
  t1 = r0
  t2 = r1
  t3 = r2

  r3 = t3
  r2 = t2
  r1 = t1
  r0 = t0

And then we can start eliminating temporaries we don't need and get:

  r3 = r2
  r2 = r1
  r1 = r0
  r0 = 3

Which is much neater.

It seems to me that this would probably be of benefit on all targets.

This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default
for all targets. However, this would have a perceivable impact on argument
evaluation order (which is unspecified in C, but people have shown
sensitivity to it changing in the past and we suspect some people may
have built systems relying on the behviour... ho hum...).

However, now that all side effects are handled when we are in SSA
form, it should be possible to refactor calls.c and expr.c as if
PUSH_ARGS_REVERSED were always defined to 1, without changing the
argument evaluation order in gimplify.c.

In this way, we have the best of both worlds; we maintain argument
evaluation order and gain the optimizations given by removing
overlapping live ranges for parameters.

I've bootstrapped and regression tested this on x86, ARM and AArch64 - but
I am well aware these are fairly standard targets, do you have any
suggestions or comments on which targets this change will affect?

If not, is this OK for stage-1?

Thanks,
James

---
gcc/

2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* calls.c (initialize_argument_information): Always treat
	PUSH_ARGS_REVERSED as 1, simplify code accordingly.
	(expand_call): Likewise.
	(emit_library_call_calue_1): Likewise.
	* expr.c (PUSH_ARGS_REVERSED): Do not define.
	(emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
	code accordingly.

Comments

Richard Biener March 24, 2014, 11:54 a.m. UTC | #1
On Mon, Mar 24, 2014 at 12:44 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> Consider this testcase:
>
>   extern void foo (int a, int b, int c, int d);
>
>   void
>   bar (int b, int c, int d)
>   {
>       foo (3, b, c, d);
>   }
>
> For many ABI's we will have on entry to the function
>
>   r0 = b
>   r1 = c
>   r2 = d
>
> If we process arguments to the call to foo left-to-right
> (PUSH_ARGS_REVERSED == 0) we assign temporaries, and then hard registers
> in this order:
>
>   t0 = 3
>   t1 = r0
>   t2 = r1
>   t3 = r2
>
>   r0 = t0
>   r1 = t1
>   r2 = t2
>   r3 = t3
>
> We can't eliminate any of these temporaries as their live ranges overlap.
>
> However, If we process the arguments right-to-left (PUSH_ARGS_REVERSED == 1),
> we get this order:
>
>   t0 = 3
>   t1 = r0
>   t2 = r1
>   t3 = r2
>
>   r3 = t3
>   r2 = t2
>   r1 = t1
>   r0 = t0
>
> And then we can start eliminating temporaries we don't need and get:
>
>   r3 = r2
>   r2 = r1
>   r1 = r0
>   r0 = 3
>
> Which is much neater.
>
> It seems to me that this would probably be of benefit on all targets.
>
> This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default
> for all targets. However, this would have a perceivable impact on argument
> evaluation order (which is unspecified in C, but people have shown
> sensitivity to it changing in the past and we suspect some people may
> have built systems relying on the behviour... ho hum...).
>
> However, now that all side effects are handled when we are in SSA
> form, it should be possible to refactor calls.c and expr.c as if
> PUSH_ARGS_REVERSED were always defined to 1, without changing the
> argument evaluation order in gimplify.c.
>
> In this way, we have the best of both worlds; we maintain argument
> evaluation order and gain the optimizations given by removing
> overlapping live ranges for parameters.
>
> I've bootstrapped and regression tested this on x86, ARM and AArch64 - but
> I am well aware these are fairly standard targets, do you have any
> suggestions or comments on which targets this change will affect?
>
> If not, is this OK for stage-1?

Maybe somebody remembers why we have both paths.  I'd rather
make gimplification independent of this as well, choosing either
variant.

Do you have any hard numbers to compare?  Say cc1 code size,
when comparing PUSH_ARGS_REVERSED 1 vs 0?

Thanks,
Richard.

> Thanks,
> James
>
> ---
> gcc/
>
> 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * calls.c (initialize_argument_information): Always treat
>         PUSH_ARGS_REVERSED as 1, simplify code accordingly.
>         (expand_call): Likewise.
>         (emit_library_call_calue_1): Likewise.
>         * expr.c (PUSH_ARGS_REVERSED): Do not define.
>         (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
>         code accordingly.
Michael Matz March 24, 2014, 1:23 p.m. UTC | #2
Hi,

On Mon, 24 Mar 2014, Richard Biener wrote:

> Maybe somebody remembers why we have both paths.  I'd rather make 
> gimplification independent of this as well, choosing either variant.

I can't say why we have the !PUSH_ARGS_REVERSED path (so I guess that's 
the way that initially was there, before the distinction was made), but 
the PUSH_ARGS_REVERSED==1 path is important when you have push 
instructions and arguments grow in the opposite direction of the stack.  
The latter is the case for almost all targets, so as soon as you have push 
instructions (and no ACCUMULATE_OUTGOING_ARGS) you'll want to have 
PUSH_ARGS_REVERSED==1 as well.

I guess the !PUSH_ARGS_REVERSED path only has advantages for the few 
targets where stack and args grow the same way and that use push 
instructions for argument passing, which currently are none.  pa-64 does 
have same-direction arguments and stack (both upward), but doesn't use 
push, and the only other ARGS_GROW_DOWNWARD (pa-32bit and stormy16) also 
have !STACK_GROWS_DOWNWARD, so again opposite direction.

Without push instructions or with ACCUMULATE_OUTGOING_ARGS I guess it 
doesn't matter much for parameters passed in memory (the offsets simply 
will be N to 0 instead of 0 to N), and for register arguments there's a 
slight advantage for PUSH_ARGS_REVERSED==1 for the usual case of 
same-ordered arguments passed down, like in James' example.

> Do you have any hard numbers to compare?  Say cc1 code size, when 
> comparing PUSH_ARGS_REVERSED 1 vs 0?

If we'd agree on only one way, then it must be PUSH_ARGS_REVERSED==1.  
That would be a change in argument gimplification (and hence eval) order 
for many targets, and could result in unexpected side-effects for invalid 
code, like James said.  IMO no reason to not try that in stage 1, though.


Ciao,
Michael.
James Greenhalgh March 24, 2014, 6:53 p.m. UTC | #3
On Mon, Mar 24, 2014 at 11:54:49AM +0000, Richard Biener wrote:
> On Mon, Mar 24, 2014 at 12:44 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > Consider this testcase:
> >
> >   extern void foo (int a, int b, int c, int d);
> >
> >   void
> >   bar (int b, int c, int d)
> >   {
> >       foo (3, b, c, d);
> >   }
> >
> > For many ABI's we will have on entry to the function
> >
> >   r0 = b
> >   r1 = c
> >   r2 = d
> >
> > If we process arguments to the call to foo left-to-right
> > (PUSH_ARGS_REVERSED == 0) we assign temporaries, and then hard registers
> > in this order:
> >
> >   t0 = 3
> >   t1 = r0
> >   t2 = r1
> >   t3 = r2
> >
> >   r0 = t0
> >   r1 = t1
> >   r2 = t2
> >   r3 = t3
> >
> > We can't eliminate any of these temporaries as their live ranges overlap.
> >
> > However, If we process the arguments right-to-left (PUSH_ARGS_REVERSED == 1),
> > we get this order:
> >
> >   t0 = 3
> >   t1 = r0
> >   t2 = r1
> >   t3 = r2
> >
> >   r3 = t3
> >   r2 = t2
> >   r1 = t1
> >   r0 = t0
> >
> > And then we can start eliminating temporaries we don't need and get:
> >
> >   r3 = r2
> >   r2 = r1
> >   r1 = r0
> >   r0 = 3
> >
> > Which is much neater.
> >
> > It seems to me that this would probably be of benefit on all targets.
> >
> > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default
> > for all targets. However, this would have a perceivable impact on argument
> > evaluation order (which is unspecified in C, but people have shown
> > sensitivity to it changing in the past and we suspect some people may
> > have built systems relying on the behviour... ho hum...).
> >
> > However, now that all side effects are handled when we are in SSA
> > form, it should be possible to refactor calls.c and expr.c as if
> > PUSH_ARGS_REVERSED were always defined to 1, without changing the
> > argument evaluation order in gimplify.c.
> >
> > In this way, we have the best of both worlds; we maintain argument
> > evaluation order and gain the optimizations given by removing
> > overlapping live ranges for parameters.
> >
> > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but
> > I am well aware these are fairly standard targets, do you have any
> > suggestions or comments on which targets this change will affect?
> >
> > If not, is this OK for stage-1?
> 
> Maybe somebody remembers why we have both paths.  I'd rather
> make gimplification independent of this as well, choosing either
> variant.
> 
> Do you have any hard numbers to compare?  Say cc1 code size,
> when comparing PUSH_ARGS_REVERSED 1 vs 0?

Sure, here is the output of size for AArch64 and ARM. Both of these targets
have PUSH_ARGS_REVERSED == 0. i386 results are not interesting (on the order
of tens of bytes), as PUSH_ARGS_REVERSED == 1 for that target.

trunk is revision 208685
patched is revision 208685 with this patch applied.

AArch64:

   text    data     bss     dec     hex     filename
14546902   82424  781944 15411270  eb2846   trunk/gcc/cc1
14423510   82424  781944 15287878  e94646   patched/gcc/cc1
16721485   82480  805000 17608965  10cb105  trunk/gcc/cc1plus
16564613   82480  805000 17452093  10a4c3d  patched/gcc/cc1plus
724550     7904   80744  813198    c688e    trunk/gcc/gfortran
722606     7904   80744  811254    c60f6    patched/gcc/gfortran

ARM:

   text    data     bss     dec     hex     filename
15176835   31880  694132 15902847  f2a87f   trunk/gcc/cc1
15171939   31880  694124 15897943  f29557   patched/gcc/cc1
17255818   31916  706404 17994138  112919a  trunk/gcc/cc1plus
17250418   31916  706396 17988730  1127c7a  patched/gcc/cc1plus
666307     4824   26780  697911    aa637    trunk/gcc/gfortran
664887     4824   26780  696491    aa0ab    patched/gcc/gfortran

AArch64 benefits more from the optimization, but there is also an
improvement for ARM.

Thanks,
James

> > ---
> > gcc/
> >
> > 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> >         * calls.c (initialize_argument_information): Always treat
> >         PUSH_ARGS_REVERSED as 1, simplify code accordingly.
> >         (expand_call): Likewise.
> >         (emit_library_call_calue_1): Likewise.
> >         * expr.c (PUSH_ARGS_REVERSED): Do not define.
> >         (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
> >         code accordingly.
>
Richard Henderson March 24, 2014, 8:15 p.m. UTC | #4
On 03/24/2014 06:23 AM, Michael Matz wrote:
> I can't say why we have the !PUSH_ARGS_REVERSED path (so I guess that's 
> the way that initially was there, before the distinction was made), but 
> the PUSH_ARGS_REVERSED==1 path is important when you have push 
> instructions and arguments grow in the opposite direction of the stack.  
> The latter is the case for almost all targets, so as soon as you have push 
> instructions (and no ACCUMULATE_OUTGOING_ARGS) you'll want to have 
> PUSH_ARGS_REVERSED==1 as well.
> 
> I guess the !PUSH_ARGS_REVERSED path only has advantages for the few 
> targets where stack and args grow the same way and that use push 
> instructions for argument passing, which currently are none.  pa-64 does 
> have same-direction arguments and stack (both upward), but doesn't use 
> push, and the only other ARGS_GROW_DOWNWARD (pa-32bit and stormy16) also 
> have !STACK_GROWS_DOWNWARD, so again opposite direction.
> 

See

  http://en.wikipedia.org/wiki/X86_calling_conventions#pascal

Since we don't actually support this anymore, we can certainly tidy this up.


r~
Michael Matz March 25, 2014, 1:18 p.m. UTC | #5
Hi,

On Mon, 24 Mar 2014, Richard Henderson wrote:

> See
> 
>   http://en.wikipedia.org/wiki/X86_calling_conventions#pascal
> 
> Since we don't actually support this anymore, we can certainly tidy this 
> up.

Yeah, I thought about that, but I couldn't see how that could have used 
PUSH_ARGS_REVERSED.  The above calling convention would be a language 
setting while the macro is a target setting, and for interoperability even 
the Pascal compiler would have to be able to create some calls with 
reversed arguments.  So I thought that Pascal never was the purpose for 
the macro.  (I didn't check to make sure, though)


Ciao,
Michael.
Jeff Law April 25, 2014, 8:15 p.m. UTC | #6
On 03/24/14 05:44, James Greenhalgh wrote:
> Which is much neater.
>
> It seems to me that this would probably be of benefit on all targets.
>
> This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default
> for all targets. However, this would have a perceivable impact on argument
> evaluation order (which is unspecified in C, but people have shown
> sensitivity to it changing in the past and we suspect some people may
> have built systems relying on the behviour... ho hum...).
>
> However, now that all side effects are handled when we are in SSA
> form, it should be possible to refactor calls.c and expr.c as if
> PUSH_ARGS_REVERSED were always defined to 1, without changing the
> argument evaluation order in gimplify.c.
>
> In this way, we have the best of both worlds; we maintain argument
> evaluation order and gain the optimizations given by removing
> overlapping live ranges for parameters.
>
> I've bootstrapped and regression tested this on x86, ARM and AArch64 - but
> I am well aware these are fairly standard targets, do you have any
> suggestions or comments on which targets this change will affect?
>
> If not, is this OK for stage-1?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* calls.c (initialize_argument_information): Always treat
> 	PUSH_ARGS_REVERSED as 1, simplify code accordingly.
> 	(expand_call): Likewise.
> 	(emit_library_call_calue_1): Likewise.
> 	* expr.c (PUSH_ARGS_REVERSED): Do not define.
> 	(emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
> 	code accordingly.
This is fine for the trunk at this point.

Sorry for the delay.

jeff
Eric Botcazou April 26, 2014, 1:25 p.m. UTC | #7
> > 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
> > 
> > 	* calls.c (initialize_argument_information): Always treat
> > 	PUSH_ARGS_REVERSED as 1, simplify code accordingly.
> > 	(expand_call): Likewise.
> > 	(emit_library_call_calue_1): Likewise.
> > 	* expr.c (PUSH_ARGS_REVERSED): Do not define.
> > 	(emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
> > 	code accordingly.
> 
> This is fine for the trunk at this point.

Are you sure that it's not a correctness issue for some targets though?
Richard Earnshaw April 29, 2014, 10:04 a.m. UTC | #8
On 26/04/14 14:25, Eric Botcazou wrote:
>>> 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
>>>
>>> 	* calls.c (initialize_argument_information): Always treat
>>> 	PUSH_ARGS_REVERSED as 1, simplify code accordingly.
>>> 	(expand_call): Likewise.
>>> 	(emit_library_call_calue_1): Likewise.
>>> 	* expr.c (PUSH_ARGS_REVERSED): Do not define.
>>> 	(emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
>>> 	code accordingly.
>>
>> This is fine for the trunk at this point.
> 
> Are you sure that it's not a correctness issue for some targets though?
> 

Ordering of side-effects these days are handled during gimplification.
If there are any correctness issues relating to this, then I think we've
got a bigger problem.

R.
James Greenhalgh April 29, 2014, 1:49 p.m. UTC | #9
On Fri, Apr 25, 2014 at 09:15:57PM +0100, Jeff Law wrote:
> On 03/24/14 05:44, James Greenhalgh wrote:
> > Which is much neater.
> >
> > It seems to me that this would probably be of benefit on all targets.
> >
> > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default
> > for all targets. However, this would have a perceivable impact on argument
> > evaluation order (which is unspecified in C, but people have shown
> > sensitivity to it changing in the past and we suspect some people may
> > have built systems relying on the behviour... ho hum...).
> >
> > However, now that all side effects are handled when we are in SSA
> > form, it should be possible to refactor calls.c and expr.c as if
> > PUSH_ARGS_REVERSED were always defined to 1, without changing the
> > argument evaluation order in gimplify.c.
> >
> > In this way, we have the best of both worlds; we maintain argument
> > evaluation order and gain the optimizations given by removing
> > overlapping live ranges for parameters.
> >
> > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but
> > I am well aware these are fairly standard targets, do you have any
> > suggestions or comments on which targets this change will affect?
> >
> > If not, is this OK for stage-1?
> >
> > Thanks,
> > James
> >
> > ---
> > gcc/
> >
> > 2014-03-21  James Greenhalgh  <james.greenhalgh@arm.com>
> >
> > 	* calls.c (initialize_argument_information): Always treat
> > 	PUSH_ARGS_REVERSED as 1, simplify code accordingly.
> > 	(expand_call): Likewise.
> > 	(emit_library_call_calue_1): Likewise.
> > 	* expr.c (PUSH_ARGS_REVERSED): Do not define.
> > 	(emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify
> > 	code accordingly.
> This is fine for the trunk at this point.
> 
> Sorry for the delay.

Thanks Jeff.

I've committed this as revision 209897 after checking it still bootstraps
across arm/aarch64/i386. I'll look out for any fallout on other targets, please
shout if I've broken something!

Thanks,
James
diff mbox

Patch

diff --git a/gcc/calls.c b/gcc/calls.c
index f392319..2e91918 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1104,8 +1104,6 @@  initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
 {
   CUMULATIVE_ARGS *args_so_far_pnt = get_cumulative_args (args_so_far);
   location_t loc = EXPR_LOCATION (exp);
-  /* 1 if scanning parms front to back, -1 if scanning back to front.  */
-  int inc;
 
   /* Count arg position in order args appear.  */
   int argpos;
@@ -1116,22 +1114,9 @@  initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
   args_size->var = 0;
 
   /* In this loop, we consider args in the order they are written.
-     We fill up ARGS from the front or from the back if necessary
-     so that in any case the first arg to be pushed ends up at the front.  */
+     We fill up ARGS from the back.  */
 
-  if (PUSH_ARGS_REVERSED)
-    {
-      i = num_actuals - 1, inc = -1;
-      /* In this case, must reverse order of args
-	 so that we compute and push the last arg first.  */
-    }
-  else
-    {
-      i = 0, inc = 1;
-    }
-
-  /* First fill in the actual arguments in the ARGS array, splitting
-     complex arguments if necessary.  */
+  i = num_actuals - 1;
   {
     int j = i;
     call_expr_arg_iterator iter;
@@ -1140,7 +1125,7 @@  initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
     if (struct_value_addr_value)
       {
 	args[j].tree_value = struct_value_addr_value;
-	j += inc;
+	j--;
       }
     FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
       {
@@ -1152,17 +1137,17 @@  initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED,
 	  {
 	    tree subtype = TREE_TYPE (argtype);
 	    args[j].tree_value = build1 (REALPART_EXPR, subtype, arg);
-	    j += inc;
+	    j--;
 	    args[j].tree_value = build1 (IMAGPART_EXPR, subtype, arg);
 	  }
 	else
 	  args[j].tree_value = arg;
-	j += inc;
+	j--;
       }
   }
 
   /* I counts args in order (to be) pushed; ARGPOS counts in order written.  */
-  for (argpos = 0; argpos < num_actuals; i += inc, argpos++)
+  for (argpos = 0; argpos < num_actuals; i--, argpos++)
     {
       tree type = TREE_TYPE (args[i].tree_value);
       int unsignedp;
@@ -2952,9 +2937,8 @@  expand_call (tree exp, rtx target, int ignore)
 
       compute_argument_addresses (args, argblock, num_actuals);
 
-      /* If we push args individually in reverse order, perform stack alignment
-	 before the first push (the last arg).  */
-      if (PUSH_ARGS_REVERSED && argblock == 0
+      /* Perform stack alignment before the first push (the last arg).  */
+      if (argblock == 0
           && adjusted_args_size.constant > reg_parm_stack_space
 	  && adjusted_args_size.constant != unadjusted_args_size)
 	{
@@ -3097,12 +3081,6 @@  expand_call (tree exp, rtx target, int ignore)
 		sibcall_failure = 1;
 	    }
 
-      /* If we pushed args in forward order, perform stack alignment
-	 after pushing the last arg.  */
-      if (!PUSH_ARGS_REVERSED && argblock == 0)
-	anti_adjust_stack (GEN_INT (adjusted_args_size.constant
-				    - unadjusted_args_size));
-
       /* If register arguments require space on the stack and stack space
 	 was not preallocated, allocate stack space here for arguments
 	 passed in registers.  */
@@ -3152,8 +3130,7 @@  expand_call (tree exp, rtx target, int ignore)
       if (pass == 1 && (return_flags & ERF_RETURNS_ARG))
 	{
 	  int arg_nr = return_flags & ERF_RETURN_ARG_MASK;
-	  if (PUSH_ARGS_REVERSED)
-	    arg_nr = num_actuals - arg_nr - 1;
+	  arg_nr = num_actuals - arg_nr - 1;
 	  if (arg_nr >= 0
 	      && arg_nr < num_actuals
 	      && args[arg_nr].reg
@@ -3597,7 +3574,6 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
      isn't present here, so we default to native calling abi here.  */
   tree fndecl ATTRIBUTE_UNUSED = NULL_TREE; /* library calls default to host calling abi ? */
   tree fntype ATTRIBUTE_UNUSED = NULL_TREE; /* library calls default to host calling abi ? */
-  int inc;
   int count;
   rtx argblock = 0;
   CUMULATIVE_ARGS args_so_far_v;
@@ -3946,22 +3922,13 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 	argblock = push_block (GEN_INT (args_size.constant), 0, 0);
     }
 
-  /* If we push args individually in reverse order, perform stack alignment
+  /* We push args individually in reverse order, perform stack alignment
      before the first push (the last arg).  */
-  if (argblock == 0 && PUSH_ARGS_REVERSED)
+  if (argblock == 0)
     anti_adjust_stack (GEN_INT (args_size.constant
 				- original_args_size.constant));
 
-  if (PUSH_ARGS_REVERSED)
-    {
-      inc = -1;
-      argnum = nargs - 1;
-    }
-  else
-    {
-      inc = 1;
-      argnum = 0;
-    }
+  argnum = nargs - 1;
 
 #ifdef REG_PARM_STACK_SPACE
   if (ACCUMULATE_OUTGOING_ARGS)
@@ -3978,7 +3945,7 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 
   /* ARGNUM indexes the ARGVEC array in the order in which the arguments
      are to be pushed.  */
-  for (count = 0; count < nargs; count++, argnum += inc)
+  for (count = 0; count < nargs; count++, argnum--)
     {
       enum machine_mode mode = argvec[argnum].mode;
       rtx val = argvec[argnum].value;
@@ -4080,16 +4047,7 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 	}
     }
 
-  /* If we pushed args in forward order, perform stack alignment
-     after pushing the last arg.  */
-  if (argblock == 0 && !PUSH_ARGS_REVERSED)
-    anti_adjust_stack (GEN_INT (args_size.constant
-				- original_args_size.constant));
-
-  if (PUSH_ARGS_REVERSED)
-    argnum = nargs - 1;
-  else
-    argnum = 0;
+  argnum = nargs - 1;
 
   fun = prepare_call_address (NULL, fun, NULL, &call_fusage, 0, 0);
 
@@ -4097,7 +4055,7 @@  emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 
   /* ARGNUM indexes the ARGVEC array in the order in which the arguments
      are to be pushed.  */
-  for (count = 0; count < nargs; count++, argnum += inc)
+  for (count = 0; count < nargs; count++, argnum--)
     {
       enum machine_mode mode = argvec[argnum].mode;
       rtx val = argvec[argnum].value;
diff --git a/gcc/expr.c b/gcc/expr.c
index be62c53..2ef6c63 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -68,22 +68,6 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "cfgexpand.h"
 
-/* Decide whether a function's arguments should be processed
-   from first to last or from last to first.
-
-   They should if the stack and args grow in opposite directions, but
-   only if we have push insns.  */
-
-#ifdef PUSH_ROUNDING
-
-#ifndef PUSH_ARGS_REVERSED
-#if defined (STACK_GROWS_DOWNWARD) != defined (ARGS_GROW_DOWNWARD)
-#define PUSH_ARGS_REVERSED	/* If it's last to first.  */
-#endif
-#endif
-
-#endif
-
 #ifndef STACK_PUSH_CODE
 #ifdef STACK_GROWS_DOWNWARD
 #define STACK_PUSH_CODE PRE_DEC
@@ -4354,11 +4338,7 @@  emit_push_insn (rtx x, enum machine_mode mode, tree type, rtx size,
       /* Loop over all the words allocated on the stack for this arg.  */
       /* We can do it by words, because any scalar bigger than a word
 	 has a size a multiple of a word.  */
-#ifndef PUSH_ARGS_REVERSED
-      for (i = not_stack; i < size; i++)
-#else
       for (i = size - 1; i >= not_stack; i--)
-#endif
 	if (i >= not_stack + offset)
 	  emit_push_insn (operand_subword_force (x, i, mode),
 			  word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,