Message ID | 20151125125610.GA19687@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On 11/25/2015 01:56 PM, Dominik Vogt wrote: > The attached patch fixes a warning during Linux kernel compilation > on S/390 due to -mwarn-dynamicstack and runtime alignment of stack > variables with constant size causing cfun->calls_alloca to be set > (even if alloca is not used at all). The patched code places > constant size runtime aligned variables in the "virtual stack > vars" area instead of creating a "virtual stack dynamic" area. > > This behaviour is activated by defining > > #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1 > > in the backend; otherwise the old logic is used. > > The kernel uses runtime alignment for the page structure (aligned > to 16 bytes), Just curious, is that necessary or is it an optimization for statically allocated page structures? > * cfgexpand.c (expand_stack_vars): Implement > ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE. > * explow.c (get_dynamic_stack_base): New function to return an address > expression for the dynamic stack base when using > ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE. > (allocate_dynamic_stack_space): Use new functions. > (align_dynamic_address, adjust_size_align): Move some code > from allocate_dynamic_stack_space to new functions. > * explow.h (get_dynamic_stack_base): Export. > * doc/tm.texi (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Document. > * config/s390/s390.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define. > * defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define by > default. I think the approach is quite reasonable. Not sure whether this is appropriate for stage3 - it does look slightly risky and may not be worth it at this point for just fixing a warning. However, I don't think this should be a target-controlled thing, just make it use the new behaviour unconditionally. Also, in the future, when making something target-controlled, use a hook, not a macro. > + /* Allocate space in the prologue, at the beginning of the virtual > + stack vars area. */ Is this really at the beginning? What if FRAME_GROWS_DOWNWARD? > /* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base. > + Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value. */ The comment is meaningless. Adjust how? The new function get_dynamic_stack base looks like a shrunk-down version of allocate_dynamic_stack_space. What I'm worried about is that it makes various adjustments to the size, and these are not mirrored in expand_stack_vars. That function already has (after your patch) + size = large_size + large_align / BITS_PER_UNIT; So no further adjustment should be necessary. Right? > + extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; allocate_dynamic_stack_space has extra_align here instead of the first BITS_PER_UNIT. Why isn't this retained (or, as pointed out above, why is any of this code here in the first place)? Bernd
On Wed, Nov 25, 2015 at 02:31:38PM +0100, Bernd Schmidt wrote: > On 11/25/2015 01:56 PM, Dominik Vogt wrote: > >The attached patch fixes a warning during Linux kernel compilation > >on S/390 due to -mwarn-dynamicstack and runtime alignment of stack > >variables with constant size causing cfun->calls_alloca to be set > >(even if alloca is not used at all). The patched code places > >constant size runtime aligned variables in the "virtual stack > >vars" area instead of creating a "virtual stack dynamic" area. > > > >This behaviour is activated by defining > > > > #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1 > > > >in the backend; otherwise the old logic is used. > > > >The kernel uses runtime alignment for the page structure (aligned > >to 16 bytes), > > Just curious, is that necessary or is it an optimization for > statically allocated page structures? Without looking into the details, I believe it's an optimization to have certain frequently used members of the struct always on the same cache line. > > * cfgexpand.c (expand_stack_vars): Implement > > ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE. > > * explow.c (get_dynamic_stack_base): New function to return an address > > expression for the dynamic stack base when using > > ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE. > > (allocate_dynamic_stack_space): Use new functions. > > (align_dynamic_address, adjust_size_align): Move some code > > from allocate_dynamic_stack_space to new functions. > > * explow.h (get_dynamic_stack_base): Export. > > * doc/tm.texi (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Document. > > * config/s390/s390.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define. > > * defaults.h (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE): Define by > > default. > > I think the approach is quite reasonable. Not sure whether this is > appropriate for stage3 - it does look slightly risky and may not be > worth it at this point for just fixing a warning. Our hope, was to justify this change with "does nothing except on S/390 for now". Most of the diff in explow.c is just putting common parts of allocate_dynamic_stack_space and get_dynamic_stack_base into subfunctions. It is possible to leave the existing function allocate_... untouched and just duplicate some more code in get_... and clean up the code duplication after Gcc6. > However, I don't think this should be a target-controlled thing, > just make it use the new behaviour unconditionally. Also, in the > future, when making something target-controlled, use a hook, not a > macro. All right. > >+ /* Allocate space in the prologue, at the beginning of the virtual > >+ stack vars area. */ > > Is this really at the beginning? What if FRAME_GROWS_DOWNWARD? I thought it was when I wrote the comment, but for S/390 it's actually placed what I'd call "after" the other stack variables (at a lower address). The comment is misleading. It does not matter whether the area ends up at the beginning or end. > >/* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base. > >+ Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value. */ > > The comment is meaningless. Adjust how? Yeah. See below. > The new function get_dynamic_stack base looks like a shrunk-down > version of allocate_dynamic_stack_space. It is. > What I'm worried about is > that it makes various adjustments to the size, and these are not > mirrored in expand_stack_vars. That function already has (after your > patch) > > + size = large_size + large_align / BITS_PER_UNIT; > > So no further adjustment should be necessary. Right? > > >+ extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; > > allocate_dynamic_stack_space has extra_align here instead of the > first BITS_PER_UNIT. Why isn't this retained (or, as pointed out > above, why is any of this code here in the first place)? Actually this whole calculation with size, size_align and extra is bogus code. The calculated values are not used. This makes adjust_size_align superfluous (and thereby "fixes" the comment documenting that function). I have two more questions regarding code copied frpm allocate_dynamic_stack_space. 1. Is this really necessary in get_dynamic_stack_base? > + if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) > + crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; 2. What is the effect of OPTAB_LIB_WIDEN below? I've copied it from the existing "plus" without actually understanding it. > + target = expand_binop (Pmode, add_optab, target, > + gen_int_mode (offset, Pmode), > + NULL_RTX, 1, OPTAB_LIB_WIDEN); I'll send an updated patch tomorrow. Thanks for your comments and your help. Ciao Dominik ^_^ ^_^
On 11/25/2015 03:52 PM, Dominik Vogt wrote: > Without looking into the details, I believe it's an optimization > to have certain frequently used members of the struct always on > the same cache line. Probably better to annotate the global vars then with an alignment, rather than waste space on the stack for locals by annotating the type. > I have two more questions regarding code copied frpm > allocate_dynamic_stack_space. > > 1. Is this really necessary in get_dynamic_stack_base? > >> + if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) >> + crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; Good question. Maybe it's in the wrong place. I think this is used to get the difference between known and requested stack alignment, and thereby reduce the amount of extra space we have to allocate in order to ensure the requested alignment. (i.e. you want 16 bytes of alignment and your ABI guarantees 4, you allocate 12 additional bytes. That's the must_align stuff in allocate_dynamic_stack_space. You might want to copy the concept into your new code in expand_stack_vars. I'm thinking the code looking at STACK_POINTER_OFFSET/STACK_DYNAMIC_OFFSET is going to be irrelevant when you're looking at offsets from the frame pointer, so you just need to up the preferred stack boundary and use that. > 2. What is the effect of OPTAB_LIB_WIDEN below? I've copied it > from the existing "plus" without actually understanding it. > >> + target = expand_binop (Pmode, add_optab, target, >> + gen_int_mode (offset, Pmode), >> + NULL_RTX, 1, OPTAB_LIB_WIDEN); /* Passed to expand_simple_binop and expand_binop to say which options to try to use if the requested operation can't be open-coded on the requisite mode. Either OPTAB_LIB or OPTAB_LIB_WIDEN says try using a library call. Either OPTAB_WIDEN or OPTAB_LIB_WIDEN says try using a wider mode. OPTAB_MUST_WIDEN says try widening and don't try anything else. */ Bernd
From f96098e0f399b2b9c932f174a4050cab38233b6c Mon Sep 17 00:00:00 2001 From: Dominik Vogt <vogt@linux.vnet.ibm.com> Date: Mon, 23 Nov 2015 14:35:50 +0100 Subject: [PATCH] Allocate constant size dynamic stack space in the prologue ... ... and place it in the virtual stack vars area, if the platform supports it. On S/390 this saves adjusting the stack pointer twice and forcing the frame pointer into existence. It also removes the warning with -mwarn-dynamicstack that is triggered by cfun->calls_alloca == 1. This fixes a problem with the Linux kernel which aligns the page structure to 16 bytes at run time using inefficient code and issuing a bogus warning. --- gcc/cfgexpand.c | 21 +++++- gcc/config/s390/s390.h | 4 ++ gcc/defaults.h | 4 ++ gcc/doc/tm.texi | 6 ++ gcc/doc/tm.texi.in | 6 ++ gcc/explow.c | 129 ++++++++++++++++++++++++++--------- gcc/explow.h | 4 ++ gcc/testsuite/gcc.dg/stack-usage-2.c | 4 +- 8 files changed, 142 insertions(+), 36 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 1990e10..bb49a0e 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1079,8 +1079,25 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data) /* If there were any, allocate space. */ if (large_size > 0) - large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0, - large_align, true); + { + if (ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE) + { + HOST_WIDE_INT large_offset; + HOST_WIDE_INT size; + + size = large_size + large_align / BITS_PER_UNIT; + + /* Allocate space in the prologue, at the beginning of the virtual + stack vars area. */ + large_offset = alloc_stack_frame_space (size, 1); + large_base = get_dynamic_stack_base (GEN_INT (large_size), + large_offset, large_align); + } + else + /* Allocate space now. */ + large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0, + large_align, true); + } } for (si = 0; si < n; ++si) diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index a0faf13..073ce5c 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -594,6 +594,10 @@ extern const enum reg_class regclass_map[FIRST_PSEUDO_REGISTER]; #define STACK_DYNAMIC_OFFSET(FUNDECL) \ (STACK_POINTER_OFFSET + crtl->outgoing_args_size) +/* Constant size dynamic stack space can be allocated through the function + prologue to save the extra instructions to adjust the stack pointer. */ +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1 + /* Offset of first parameter from the argument pointer register value. We have a fake argument pointer register that points directly to the argument area. */ diff --git a/gcc/defaults.h b/gcc/defaults.h index 0f1c713..abd6ab2 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1055,6 +1055,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define STACK_POINTER_OFFSET 0 #endif +#ifndef ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE +#define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 0 +#endif + #ifndef LOCAL_REGNO #define LOCAL_REGNO(REGNO) 0 #endif diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bde808b..921e98a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3015,6 +3015,12 @@ length of the outgoing arguments. The default is correct for most machines. See @file{function.c} for details. @end defmac +@defmac ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE +Define to non-zero to enable allocating constant dynamic stack space +through the function prologue. This affects only stack variables with +run time alignment. +@end defmac + @defmac INITIAL_FRAME_ADDRESS_RTX A C expression whose value is RTL representing the address of the initial stack frame. This address is passed to @code{RETURN_ADDR_RTX} and diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 0677fc1..5512577 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -2621,6 +2621,12 @@ length of the outgoing arguments. The default is correct for most machines. See @file{function.c} for details. @end defmac +@defmac ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE +Define to non-zero to enable allocating constant dynamic stack space +through the function prologue. This affects only stack variables with +run time alignment. +@end defmac + @defmac INITIAL_FRAME_ADDRESS_RTX A C expression whose value is RTL representing the address of the initial stack frame. This address is passed to @code{RETURN_ADDR_RTX} and diff --git a/gcc/explow.c b/gcc/explow.c index e6a69e0..c6848d3 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1140,6 +1140,52 @@ record_new_stack_level (void) update_sjlj_context (); } +/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET. */ +static rtx +align_dynamic_address (rtx target, unsigned required_align) +{ + /* CEIL_DIV_EXPR needs to worry about the addition overflowing, + but we know it can't. So add ourselves and then do + TRUNC_DIV_EXPR. */ + target = expand_binop (Pmode, add_optab, target, + gen_int_mode (required_align / BITS_PER_UNIT - 1, + Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, + Pmode), + NULL_RTX, 1); + target = expand_mult (Pmode, target, + gen_int_mode (required_align / BITS_PER_UNIT, + Pmode), + NULL_RTX, 1); + + return target; +} + +/* Common code used by allocate_dynamic_stack_space and get_dynamic_stack_base. + Adjust SIZE_ALIGN for SIZE, if needed, and returns the updated value. */ +static unsigned +adjust_size_align (rtx size, unsigned size_align) +{ + if (CONST_INT_P (size)) + { + unsigned HOST_WIDE_INT lsb; + + lsb = INTVAL (size); + lsb &= -lsb; + + /* Watch out for overflow truncating to "unsigned". */ + if (lsb > UINT_MAX / BITS_PER_UNIT) + size_align = 1u << (HOST_BITS_PER_INT - 1); + else + size_align = (unsigned)lsb * BITS_PER_UNIT; + } + else if (size_align < BITS_PER_UNIT) + size_align = BITS_PER_UNIT; + return size_align; +} + /* Return an rtx representing the address of an area of memory dynamically pushed on the stack. @@ -1216,22 +1262,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode) size = convert_to_mode (Pmode, size, 1); - /* Adjust SIZE_ALIGN, if needed. */ - if (CONST_INT_P (size)) - { - unsigned HOST_WIDE_INT lsb; - - lsb = INTVAL (size); - lsb &= -lsb; - - /* Watch out for overflow truncating to "unsigned". */ - if (lsb > UINT_MAX / BITS_PER_UNIT) - size_align = 1u << (HOST_BITS_PER_INT - 1); - else - size_align = (unsigned)lsb * BITS_PER_UNIT; - } - else if (size_align < BITS_PER_UNIT) - size_align = BITS_PER_UNIT; + size_align = adjust_size_align (size, size_align); /* We can't attempt to minimize alignment necessary, because we don't know the final value of preferred_stack_boundary yet while executing @@ -1473,23 +1504,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, } if (must_align) - { - /* CEIL_DIV_EXPR needs to worry about the addition overflowing, - but we know it can't. So add ourselves and then do - TRUNC_DIV_EXPR. */ - target = expand_binop (Pmode, add_optab, target, - gen_int_mode (required_align / BITS_PER_UNIT - 1, - Pmode), - NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, - Pmode), - NULL_RTX, 1); - target = expand_mult (Pmode, target, - gen_int_mode (required_align / BITS_PER_UNIT, - Pmode), - NULL_RTX, 1); - } + target = align_dynamic_address (target, required_align); /* Now that we've committed to a return value, mark its alignment. */ mark_reg_pointer (target, required_align); @@ -1499,6 +1514,54 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, return target; } + +/* Return an rtx representing the address of an area of memory already + statically pushed onto the stack in the virtual stack vars area. (It is + assumed that the area is allocated in the function prologue.) + + Any required stack pointer alignment is preserved. + + SIZE is an rtx representing the size of the area. SIZE must be a constant + in VOIDmode or Pmode larger than zero. + + OFFSET is the offset of the area into the virtual stack vars area. + + REQUIRED_ALIGN is the alignment (in bits) required for the region + of memory. */ + +rtx +get_dynamic_stack_base (rtx size, HOST_WIDE_INT offset, unsigned required_align) +{ + rtx target; + unsigned size_align; + unsigned extra; + + size_align = adjust_size_align (size, 0); + + if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) + crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + + extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; + size = plus_constant (Pmode, size, extra); + size = force_operand (size, NULL_RTX); + if (extra && size_align > BITS_PER_UNIT) + size_align = BITS_PER_UNIT; + + if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) + size = round_push (size); + + target = gen_reg_rtx (Pmode); + emit_move_insn (target, virtual_stack_vars_rtx); + target = expand_binop (Pmode, add_optab, target, + gen_int_mode (offset, Pmode), + NULL_RTX, 1, OPTAB_LIB_WIDEN); + target = align_dynamic_address (target, required_align); + + /* Now that we've committed to a return value, mark its alignment. */ + mark_reg_pointer (target, required_align); + + return target; +} /* A front end may want to override GCC's stack checking by providing a run-time routine to call to check the stack, so provide a mechanism for diff --git a/gcc/explow.h b/gcc/explow.h index 52113db..aa42efb 100644 --- a/gcc/explow.h +++ b/gcc/explow.h @@ -87,6 +87,10 @@ extern void record_new_stack_level (void); /* Allocate some space on the stack dynamically and return its address. */ extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool); +/* Returns the address of the dynamic stack space without allocating it. */ +extern rtx get_dynamic_stack_base (rtx size, HOST_WIDE_INT offset, + unsigned required_align); + /* Emit one stack probe at ADDRESS, an address within the stack. */ extern void emit_stack_probe (rtx); diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c index c2527d2..7d246ec 100644 --- a/gcc/testsuite/gcc.dg/stack-usage-2.c +++ b/gcc/testsuite/gcc.dg/stack-usage-2.c @@ -17,7 +17,9 @@ int foo2 (void) /* { dg-warning "stack usage is \[0-9\]* bytes" } */ return 0; } -int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */ +/* The actual warning depends on whether stack space is allocated dynamically + or staically. */ +int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */ { char arr[1024] __attribute__((aligned (512))); arr[0] = 1; -- 2.3.0