Message ID | 1395661485-28060-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | New |
Headers | show |
On Mon, Mar 24, 2014 at 12:44 PM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > Hi, > > Consider this testcase: > > extern void foo (int a, int b, int c, int d); > > void > bar (int b, int c, int d) > { > foo (3, b, c, d); > } > > For many ABI's we will have on entry to the function > > r0 = b > r1 = c > r2 = d > > If we process arguments to the call to foo left-to-right > (PUSH_ARGS_REVERSED == 0) we assign temporaries, and then hard registers > in this order: > > t0 = 3 > t1 = r0 > t2 = r1 > t3 = r2 > > r0 = t0 > r1 = t1 > r2 = t2 > r3 = t3 > > We can't eliminate any of these temporaries as their live ranges overlap. > > However, If we process the arguments right-to-left (PUSH_ARGS_REVERSED == 1), > we get this order: > > t0 = 3 > t1 = r0 > t2 = r1 > t3 = r2 > > r3 = t3 > r2 = t2 > r1 = t1 > r0 = t0 > > And then we can start eliminating temporaries we don't need and get: > > r3 = r2 > r2 = r1 > r1 = r0 > r0 = 3 > > Which is much neater. > > It seems to me that this would probably be of benefit on all targets. > > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default > for all targets. However, this would have a perceivable impact on argument > evaluation order (which is unspecified in C, but people have shown > sensitivity to it changing in the past and we suspect some people may > have built systems relying on the behviour... ho hum...). > > However, now that all side effects are handled when we are in SSA > form, it should be possible to refactor calls.c and expr.c as if > PUSH_ARGS_REVERSED were always defined to 1, without changing the > argument evaluation order in gimplify.c. > > In this way, we have the best of both worlds; we maintain argument > evaluation order and gain the optimizations given by removing > overlapping live ranges for parameters. > > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but > I am well aware these are fairly standard targets, do you have any > suggestions or comments on which targets this change will affect? > > If not, is this OK for stage-1? Maybe somebody remembers why we have both paths. I'd rather make gimplification independent of this as well, choosing either variant. Do you have any hard numbers to compare? Say cc1 code size, when comparing PUSH_ARGS_REVERSED 1 vs 0? Thanks, Richard. > Thanks, > James > > --- > gcc/ > > 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> > > * calls.c (initialize_argument_information): Always treat > PUSH_ARGS_REVERSED as 1, simplify code accordingly. > (expand_call): Likewise. > (emit_library_call_calue_1): Likewise. > * expr.c (PUSH_ARGS_REVERSED): Do not define. > (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify > code accordingly.
Hi, On Mon, 24 Mar 2014, Richard Biener wrote: > Maybe somebody remembers why we have both paths. I'd rather make > gimplification independent of this as well, choosing either variant. I can't say why we have the !PUSH_ARGS_REVERSED path (so I guess that's the way that initially was there, before the distinction was made), but the PUSH_ARGS_REVERSED==1 path is important when you have push instructions and arguments grow in the opposite direction of the stack. The latter is the case for almost all targets, so as soon as you have push instructions (and no ACCUMULATE_OUTGOING_ARGS) you'll want to have PUSH_ARGS_REVERSED==1 as well. I guess the !PUSH_ARGS_REVERSED path only has advantages for the few targets where stack and args grow the same way and that use push instructions for argument passing, which currently are none. pa-64 does have same-direction arguments and stack (both upward), but doesn't use push, and the only other ARGS_GROW_DOWNWARD (pa-32bit and stormy16) also have !STACK_GROWS_DOWNWARD, so again opposite direction. Without push instructions or with ACCUMULATE_OUTGOING_ARGS I guess it doesn't matter much for parameters passed in memory (the offsets simply will be N to 0 instead of 0 to N), and for register arguments there's a slight advantage for PUSH_ARGS_REVERSED==1 for the usual case of same-ordered arguments passed down, like in James' example. > Do you have any hard numbers to compare? Say cc1 code size, when > comparing PUSH_ARGS_REVERSED 1 vs 0? If we'd agree on only one way, then it must be PUSH_ARGS_REVERSED==1. That would be a change in argument gimplification (and hence eval) order for many targets, and could result in unexpected side-effects for invalid code, like James said. IMO no reason to not try that in stage 1, though. Ciao, Michael.
On Mon, Mar 24, 2014 at 11:54:49AM +0000, Richard Biener wrote: > On Mon, Mar 24, 2014 at 12:44 PM, James Greenhalgh > <james.greenhalgh@arm.com> wrote: > > > > Hi, > > > > Consider this testcase: > > > > extern void foo (int a, int b, int c, int d); > > > > void > > bar (int b, int c, int d) > > { > > foo (3, b, c, d); > > } > > > > For many ABI's we will have on entry to the function > > > > r0 = b > > r1 = c > > r2 = d > > > > If we process arguments to the call to foo left-to-right > > (PUSH_ARGS_REVERSED == 0) we assign temporaries, and then hard registers > > in this order: > > > > t0 = 3 > > t1 = r0 > > t2 = r1 > > t3 = r2 > > > > r0 = t0 > > r1 = t1 > > r2 = t2 > > r3 = t3 > > > > We can't eliminate any of these temporaries as their live ranges overlap. > > > > However, If we process the arguments right-to-left (PUSH_ARGS_REVERSED == 1), > > we get this order: > > > > t0 = 3 > > t1 = r0 > > t2 = r1 > > t3 = r2 > > > > r3 = t3 > > r2 = t2 > > r1 = t1 > > r0 = t0 > > > > And then we can start eliminating temporaries we don't need and get: > > > > r3 = r2 > > r2 = r1 > > r1 = r0 > > r0 = 3 > > > > Which is much neater. > > > > It seems to me that this would probably be of benefit on all targets. > > > > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default > > for all targets. However, this would have a perceivable impact on argument > > evaluation order (which is unspecified in C, but people have shown > > sensitivity to it changing in the past and we suspect some people may > > have built systems relying on the behviour... ho hum...). > > > > However, now that all side effects are handled when we are in SSA > > form, it should be possible to refactor calls.c and expr.c as if > > PUSH_ARGS_REVERSED were always defined to 1, without changing the > > argument evaluation order in gimplify.c. > > > > In this way, we have the best of both worlds; we maintain argument > > evaluation order and gain the optimizations given by removing > > overlapping live ranges for parameters. > > > > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but > > I am well aware these are fairly standard targets, do you have any > > suggestions or comments on which targets this change will affect? > > > > If not, is this OK for stage-1? > > Maybe somebody remembers why we have both paths. I'd rather > make gimplification independent of this as well, choosing either > variant. > > Do you have any hard numbers to compare? Say cc1 code size, > when comparing PUSH_ARGS_REVERSED 1 vs 0? Sure, here is the output of size for AArch64 and ARM. Both of these targets have PUSH_ARGS_REVERSED == 0. i386 results are not interesting (on the order of tens of bytes), as PUSH_ARGS_REVERSED == 1 for that target. trunk is revision 208685 patched is revision 208685 with this patch applied. AArch64: text data bss dec hex filename 14546902 82424 781944 15411270 eb2846 trunk/gcc/cc1 14423510 82424 781944 15287878 e94646 patched/gcc/cc1 16721485 82480 805000 17608965 10cb105 trunk/gcc/cc1plus 16564613 82480 805000 17452093 10a4c3d patched/gcc/cc1plus 724550 7904 80744 813198 c688e trunk/gcc/gfortran 722606 7904 80744 811254 c60f6 patched/gcc/gfortran ARM: text data bss dec hex filename 15176835 31880 694132 15902847 f2a87f trunk/gcc/cc1 15171939 31880 694124 15897943 f29557 patched/gcc/cc1 17255818 31916 706404 17994138 112919a trunk/gcc/cc1plus 17250418 31916 706396 17988730 1127c7a patched/gcc/cc1plus 666307 4824 26780 697911 aa637 trunk/gcc/gfortran 664887 4824 26780 696491 aa0ab patched/gcc/gfortran AArch64 benefits more from the optimization, but there is also an improvement for ARM. Thanks, James > > --- > > gcc/ > > > > 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> > > > > * calls.c (initialize_argument_information): Always treat > > PUSH_ARGS_REVERSED as 1, simplify code accordingly. > > (expand_call): Likewise. > > (emit_library_call_calue_1): Likewise. > > * expr.c (PUSH_ARGS_REVERSED): Do not define. > > (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify > > code accordingly. >
On 03/24/2014 06:23 AM, Michael Matz wrote: > I can't say why we have the !PUSH_ARGS_REVERSED path (so I guess that's > the way that initially was there, before the distinction was made), but > the PUSH_ARGS_REVERSED==1 path is important when you have push > instructions and arguments grow in the opposite direction of the stack. > The latter is the case for almost all targets, so as soon as you have push > instructions (and no ACCUMULATE_OUTGOING_ARGS) you'll want to have > PUSH_ARGS_REVERSED==1 as well. > > I guess the !PUSH_ARGS_REVERSED path only has advantages for the few > targets where stack and args grow the same way and that use push > instructions for argument passing, which currently are none. pa-64 does > have same-direction arguments and stack (both upward), but doesn't use > push, and the only other ARGS_GROW_DOWNWARD (pa-32bit and stormy16) also > have !STACK_GROWS_DOWNWARD, so again opposite direction. > See http://en.wikipedia.org/wiki/X86_calling_conventions#pascal Since we don't actually support this anymore, we can certainly tidy this up. r~
Hi, On Mon, 24 Mar 2014, Richard Henderson wrote: > See > > http://en.wikipedia.org/wiki/X86_calling_conventions#pascal > > Since we don't actually support this anymore, we can certainly tidy this > up. Yeah, I thought about that, but I couldn't see how that could have used PUSH_ARGS_REVERSED. The above calling convention would be a language setting while the macro is a target setting, and for interoperability even the Pascal compiler would have to be able to create some calls with reversed arguments. So I thought that Pascal never was the purpose for the macro. (I didn't check to make sure, though) Ciao, Michael.
On 03/24/14 05:44, James Greenhalgh wrote: > Which is much neater. > > It seems to me that this would probably be of benefit on all targets. > > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default > for all targets. However, this would have a perceivable impact on argument > evaluation order (which is unspecified in C, but people have shown > sensitivity to it changing in the past and we suspect some people may > have built systems relying on the behviour... ho hum...). > > However, now that all side effects are handled when we are in SSA > form, it should be possible to refactor calls.c and expr.c as if > PUSH_ARGS_REVERSED were always defined to 1, without changing the > argument evaluation order in gimplify.c. > > In this way, we have the best of both worlds; we maintain argument > evaluation order and gain the optimizations given by removing > overlapping live ranges for parameters. > > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but > I am well aware these are fairly standard targets, do you have any > suggestions or comments on which targets this change will affect? > > If not, is this OK for stage-1? > > Thanks, > James > > --- > gcc/ > > 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> > > * calls.c (initialize_argument_information): Always treat > PUSH_ARGS_REVERSED as 1, simplify code accordingly. > (expand_call): Likewise. > (emit_library_call_calue_1): Likewise. > * expr.c (PUSH_ARGS_REVERSED): Do not define. > (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify > code accordingly. This is fine for the trunk at this point. Sorry for the delay. jeff
> > 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> > > > > * calls.c (initialize_argument_information): Always treat > > PUSH_ARGS_REVERSED as 1, simplify code accordingly. > > (expand_call): Likewise. > > (emit_library_call_calue_1): Likewise. > > * expr.c (PUSH_ARGS_REVERSED): Do not define. > > (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify > > code accordingly. > > This is fine for the trunk at this point. Are you sure that it's not a correctness issue for some targets though?
On 26/04/14 14:25, Eric Botcazou wrote: >>> 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> >>> >>> * calls.c (initialize_argument_information): Always treat >>> PUSH_ARGS_REVERSED as 1, simplify code accordingly. >>> (expand_call): Likewise. >>> (emit_library_call_calue_1): Likewise. >>> * expr.c (PUSH_ARGS_REVERSED): Do not define. >>> (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify >>> code accordingly. >> >> This is fine for the trunk at this point. > > Are you sure that it's not a correctness issue for some targets though? > Ordering of side-effects these days are handled during gimplification. If there are any correctness issues relating to this, then I think we've got a bigger problem. R.
On Fri, Apr 25, 2014 at 09:15:57PM +0100, Jeff Law wrote: > On 03/24/14 05:44, James Greenhalgh wrote: > > Which is much neater. > > > > It seems to me that this would probably be of benefit on all targets. > > > > This would be an argument for setting PUSH_ARGS_REVERSED to 1 by default > > for all targets. However, this would have a perceivable impact on argument > > evaluation order (which is unspecified in C, but people have shown > > sensitivity to it changing in the past and we suspect some people may > > have built systems relying on the behviour... ho hum...). > > > > However, now that all side effects are handled when we are in SSA > > form, it should be possible to refactor calls.c and expr.c as if > > PUSH_ARGS_REVERSED were always defined to 1, without changing the > > argument evaluation order in gimplify.c. > > > > In this way, we have the best of both worlds; we maintain argument > > evaluation order and gain the optimizations given by removing > > overlapping live ranges for parameters. > > > > I've bootstrapped and regression tested this on x86, ARM and AArch64 - but > > I am well aware these are fairly standard targets, do you have any > > suggestions or comments on which targets this change will affect? > > > > If not, is this OK for stage-1? > > > > Thanks, > > James > > > > --- > > gcc/ > > > > 2014-03-21 James Greenhalgh <james.greenhalgh@arm.com> > > > > * calls.c (initialize_argument_information): Always treat > > PUSH_ARGS_REVERSED as 1, simplify code accordingly. > > (expand_call): Likewise. > > (emit_library_call_calue_1): Likewise. > > * expr.c (PUSH_ARGS_REVERSED): Do not define. > > (emit_push_insn): Always treat PUSH_ARGS_REVERSED as 1, simplify > > code accordingly. > This is fine for the trunk at this point. > > Sorry for the delay. Thanks Jeff. I've committed this as revision 209897 after checking it still bootstraps across arm/aarch64/i386. I'll look out for any fallout on other targets, please shout if I've broken something! Thanks, James
diff --git a/gcc/calls.c b/gcc/calls.c index f392319..2e91918 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1104,8 +1104,6 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, { CUMULATIVE_ARGS *args_so_far_pnt = get_cumulative_args (args_so_far); location_t loc = EXPR_LOCATION (exp); - /* 1 if scanning parms front to back, -1 if scanning back to front. */ - int inc; /* Count arg position in order args appear. */ int argpos; @@ -1116,22 +1114,9 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, args_size->var = 0; /* In this loop, we consider args in the order they are written. - We fill up ARGS from the front or from the back if necessary - so that in any case the first arg to be pushed ends up at the front. */ + We fill up ARGS from the back. */ - if (PUSH_ARGS_REVERSED) - { - i = num_actuals - 1, inc = -1; - /* In this case, must reverse order of args - so that we compute and push the last arg first. */ - } - else - { - i = 0, inc = 1; - } - - /* First fill in the actual arguments in the ARGS array, splitting - complex arguments if necessary. */ + i = num_actuals - 1; { int j = i; call_expr_arg_iterator iter; @@ -1140,7 +1125,7 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; - j += inc; + j--; } FOR_EACH_CALL_EXPR_ARG (arg, iter, exp) { @@ -1152,17 +1137,17 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, { tree subtype = TREE_TYPE (argtype); args[j].tree_value = build1 (REALPART_EXPR, subtype, arg); - j += inc; + j--; args[j].tree_value = build1 (IMAGPART_EXPR, subtype, arg); } else args[j].tree_value = arg; - j += inc; + j--; } } /* I counts args in order (to be) pushed; ARGPOS counts in order written. */ - for (argpos = 0; argpos < num_actuals; i += inc, argpos++) + for (argpos = 0; argpos < num_actuals; i--, argpos++) { tree type = TREE_TYPE (args[i].tree_value); int unsignedp; @@ -2952,9 +2937,8 @@ expand_call (tree exp, rtx target, int ignore) compute_argument_addresses (args, argblock, num_actuals); - /* If we push args individually in reverse order, perform stack alignment - before the first push (the last arg). */ - if (PUSH_ARGS_REVERSED && argblock == 0 + /* Perform stack alignment before the first push (the last arg). */ + if (argblock == 0 && adjusted_args_size.constant > reg_parm_stack_space && adjusted_args_size.constant != unadjusted_args_size) { @@ -3097,12 +3081,6 @@ expand_call (tree exp, rtx target, int ignore) sibcall_failure = 1; } - /* If we pushed args in forward order, perform stack alignment - after pushing the last arg. */ - if (!PUSH_ARGS_REVERSED && argblock == 0) - anti_adjust_stack (GEN_INT (adjusted_args_size.constant - - unadjusted_args_size)); - /* If register arguments require space on the stack and stack space was not preallocated, allocate stack space here for arguments passed in registers. */ @@ -3152,8 +3130,7 @@ expand_call (tree exp, rtx target, int ignore) if (pass == 1 && (return_flags & ERF_RETURNS_ARG)) { int arg_nr = return_flags & ERF_RETURN_ARG_MASK; - if (PUSH_ARGS_REVERSED) - arg_nr = num_actuals - arg_nr - 1; + arg_nr = num_actuals - arg_nr - 1; if (arg_nr >= 0 && arg_nr < num_actuals && args[arg_nr].reg @@ -3597,7 +3574,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, isn't present here, so we default to native calling abi here. */ tree fndecl ATTRIBUTE_UNUSED = NULL_TREE; /* library calls default to host calling abi ? */ tree fntype ATTRIBUTE_UNUSED = NULL_TREE; /* library calls default to host calling abi ? */ - int inc; int count; rtx argblock = 0; CUMULATIVE_ARGS args_so_far_v; @@ -3946,22 +3922,13 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, argblock = push_block (GEN_INT (args_size.constant), 0, 0); } - /* If we push args individually in reverse order, perform stack alignment + /* We push args individually in reverse order, perform stack alignment before the first push (the last arg). */ - if (argblock == 0 && PUSH_ARGS_REVERSED) + if (argblock == 0) anti_adjust_stack (GEN_INT (args_size.constant - original_args_size.constant)); - if (PUSH_ARGS_REVERSED) - { - inc = -1; - argnum = nargs - 1; - } - else - { - inc = 1; - argnum = 0; - } + argnum = nargs - 1; #ifdef REG_PARM_STACK_SPACE if (ACCUMULATE_OUTGOING_ARGS) @@ -3978,7 +3945,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, /* ARGNUM indexes the ARGVEC array in the order in which the arguments are to be pushed. */ - for (count = 0; count < nargs; count++, argnum += inc) + for (count = 0; count < nargs; count++, argnum--) { enum machine_mode mode = argvec[argnum].mode; rtx val = argvec[argnum].value; @@ -4080,16 +4047,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, } } - /* If we pushed args in forward order, perform stack alignment - after pushing the last arg. */ - if (argblock == 0 && !PUSH_ARGS_REVERSED) - anti_adjust_stack (GEN_INT (args_size.constant - - original_args_size.constant)); - - if (PUSH_ARGS_REVERSED) - argnum = nargs - 1; - else - argnum = 0; + argnum = nargs - 1; fun = prepare_call_address (NULL, fun, NULL, &call_fusage, 0, 0); @@ -4097,7 +4055,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, /* ARGNUM indexes the ARGVEC array in the order in which the arguments are to be pushed. */ - for (count = 0; count < nargs; count++, argnum += inc) + for (count = 0; count < nargs; count++, argnum--) { enum machine_mode mode = argvec[argnum].mode; rtx val = argvec[argnum].value; diff --git a/gcc/expr.c b/gcc/expr.c index be62c53..2ef6c63 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -68,22 +68,6 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-address.h" #include "cfgexpand.h" -/* Decide whether a function's arguments should be processed - from first to last or from last to first. - - They should if the stack and args grow in opposite directions, but - only if we have push insns. */ - -#ifdef PUSH_ROUNDING - -#ifndef PUSH_ARGS_REVERSED -#if defined (STACK_GROWS_DOWNWARD) != defined (ARGS_GROW_DOWNWARD) -#define PUSH_ARGS_REVERSED /* If it's last to first. */ -#endif -#endif - -#endif - #ifndef STACK_PUSH_CODE #ifdef STACK_GROWS_DOWNWARD #define STACK_PUSH_CODE PRE_DEC @@ -4354,11 +4338,7 @@ emit_push_insn (rtx x, enum machine_mode mode, tree type, rtx size, /* Loop over all the words allocated on the stack for this arg. */ /* We can do it by words, because any scalar bigger than a word has a size a multiple of a word. */ -#ifndef PUSH_ARGS_REVERSED - for (i = not_stack; i < size; i++) -#else for (i = size - 1; i >= not_stack; i--) -#endif if (i >= not_stack + offset) emit_push_insn (operand_subword_force (x, i, mode), word_mode, NULL_TREE, NULL_RTX, align, 0, NULL_RTX,