diff mbox

[v2] Allocate constant size dynamic stack space in the prologue

Message ID 20160506093747.GA22977@linux.vnet.ibm.com
State New
Headers show

Commit Message

Dominik Vogt May 6, 2016, 9:37 a.m. UTC
Updated version of the patch described below.  Apart from fixing a
bug and adding a test, the new logic is now used always, for all
targets.  The discussion of the original patch starts here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

The new patch has been bootstrapped and regression tested on s390,
s390x and x86_64, but please check the questions/comments in the
follow up message.

On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
> The attached patch fixes a warning during Linux kernel compilation
> on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> variables with constant size causing cfun->calls_alloca to be set
> (even if alloca is not used at all).  The patched code places
> constant size runtime aligned variables in the "virtual stack
> vars" area instead of creating a "virtual stack dynamic" area.
> 
> This behaviour is activated by defining
> 
>   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> 
> in the backend; otherwise the old logic is used.
> 
> The kernel uses runtime alignment for the page structure (aligned
> to 16 bytes), and apart from triggereing the alloca warning
> (-mwarn-dynamicstack), the current Gcc also generates inefficient
> code like
> 
>   aghi %r15,-160  # prologue: create stack frame
>   lgr %r11,%r15   # prologue: generate frame pointer
>   aghi %r15,-32   # space for dynamic stack
> 
> which could be simplified to
> 
>   aghi %r15,-192
> 
> (if later optimization passes are able to get rid of the frame
> pointer).  Is there a specific reason why the patched behaviour
> shouldn't be used for all platforms?
> 
> --
> 
> As the placement of runtime aligned stack variables with constant
> size is done completely in the middleend, I don't see a way to fix
> this in the backend.

Ciao

Dominik ^_^  ^_^

Comments

Dominik Vogt May 6, 2016, 9:44 a.m. UTC | #1
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 21f21c9..4d48afd 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
...
> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  
>        /* If there were any, allocate space.  */
>        if (large_size > 0)
> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> -						   large_align, true);
> +	{
> +	  large_allocsize = GEN_INT (large_size);
> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
...

See below.

> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the prologue.
> +	       */
> +	      HOST_WIDE_INT loffset;
> +
> +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> +						 PREFERRED_STACK_BOUNDARY);

1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
2) Is this the right place for rounding up, or should 
   it be done above, maybe in get_dynamic_stack_size?

Not sure whether this is the right 

> +	      large_base = get_dynamic_stack_base (loffset, large_align);
> +	      large_allocation_done = true;
> +	    }
>  	  gcc_assert (large_base != NULL);
>  
>  	  large_alloc += alignb - 1;

> diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> new file mode 100644
> index 0000000..e06a16c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> @@ -0,0 +1,14 @@
> +/* Verify that run time aligned local variables are aloocated in the prologue
> +   in one pass together with normal local variables.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +
> +extern void bar (void *, void *, void *);
> +void foo (void)
> +{
> +  int i;
> +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
> +}
> +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */

I've no idea how to test this on other targets, or how to express
the test in a target independent way.  The scan-assembler-times
does not work on x86_64.

Ciao

Dominik ^_^  ^_^
Dominik Vogt June 20, 2016, 11:09 a.m. UTC | #2
On Fri, May 06, 2016 at 10:44:15AM +0100, Dominik Vogt wrote:
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index 21f21c9..4d48afd 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> ...
> > @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >  
> >        /* If there were any, allocate space.  */
> >        if (large_size > 0)
> > -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> > -						   large_align, true);
> > +	{
> > +	  large_allocsize = GEN_INT (large_size);
> > +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> ...
> 
> See below.
> 
> > @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
> >  	  /* Large alignment is only processed in the last pass.  */
> >  	  if (pred)
> >  	    continue;
> > +
> > +	  if (large_allocsize && ! large_allocation_done)
> > +	    {
> > +	      /* Allocate space the virtual stack vars area in the prologue.
> > +	       */
> > +	      HOST_WIDE_INT loffset;
> > +
> > +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> > +						 PREFERRED_STACK_BOUNDARY);
> 
> 1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
> 2) Is this the right place for rounding up, or should 
>    it be done above, maybe in get_dynamic_stack_size?
> 
> Not sure whether this is the right 
> 
> > +	      large_base = get_dynamic_stack_base (loffset, large_align);
> > +	      large_allocation_done = true;
> > +	    }
> >  	  gcc_assert (large_base != NULL);
> >  
> >  	  large_alloc += alignb - 1;
> 
> > diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> > new file mode 100644
> > index 0000000..e06a16c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> > @@ -0,0 +1,14 @@
> > +/* Verify that run time aligned local variables are aloocated in the prologue
> > +   in one pass together with normal local variables.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O0" } */
> > +
> > +extern void bar (void *, void *, void *);
> > +void foo (void)
> > +{
> > +  int i;
> > +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> > +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> > +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
> > +}
> > +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
> 
> I've no idea how to test this on other targets, or how to express
> the test in a target independent way.  The scan-assembler-times
> does not work on x86_64.
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
> -- 
> 
> Dominik Vogt
> IBM Germany
> 
> 


Ciao

Dominik ^_^  ^_^
Jeff Law June 23, 2016, 4:43 a.m. UTC | #3
On 05/06/2016 03:37 AM, Dominik Vogt wrote:
> Updated version of the patch described below.  Apart from fixing a
> bug and adding a test, the new logic is now used always, for all
> targets.  The discussion of the original patch starts here:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html
>
> The new patch has been bootstrapped and regression tested on s390,
> s390x and x86_64, but please check the questions/comments in the
> follow up message.
>
> On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
>> > The attached patch fixes a warning during Linux kernel compilation
>> > on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
>> > variables with constant size causing cfun->calls_alloca to be set
>> > (even if alloca is not used at all).  The patched code places
>> > constant size runtime aligned variables in the "virtual stack
>> > vars" area instead of creating a "virtual stack dynamic" area.
>> >
>> > This behaviour is activated by defining
>> >
>> >   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
Is there some reason why we don't just to this unconditionally?  ie, if 
we know the size of dynamic space, why not just always handle that in 
the prologue?  Seems like a useful optimization for a variety of reasons.

Of course if we do this unconditionally, we definitely need to find a 
way to test it better.

In reality, I don't see where/how the patch uses 
ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE anyway and it seems to be 
enabled for all targets, which is what I want :-)



>> >
>> > in the backend; otherwise the old logic is used.
>> >
>> > The kernel uses runtime alignment for the page structure (aligned
>> > to 16 bytes), and apart from triggereing the alloca warning
>> > (-mwarn-dynamicstack), the current Gcc also generates inefficient
>> > code like
>> >
>> >   aghi %r15,-160  # prologue: create stack frame
>> >   lgr %r11,%r15   # prologue: generate frame pointer
>> >   aghi %r15,-32   # space for dynamic stack
>> >
>> > which could be simplified to
>> >
>> >   aghi %r15,-192
>> >
>> > (if later optimization passes are able to get rid of the frame
>> > pointer).  Is there a specific reason why the patched behaviour
>> > shouldn't be used for all platforms?
>> >
>> > --
>> >
>> > As the placement of runtime aligned stack variables with constant
>> > size is done completely in the middleend, I don't see a way to fix
>> > this in the backend.
> Ciao
>
> Dominik ^_^  ^_^
>
> -- Dominik Vogt IBM Germany
>
>
> 0001-v2-ChangeLog
>
>
> gcc/ChangeLog
>
> 	* cfgexpand.c (expand_stack_vars): Implement synamic stack space
> 	allocation in the prologue.
> 	* explow.c (get_dynamic_stack_base): New function to return an address
> 	expression for the dynamic stack base.
> 	(get_dynamic_stack_size): New function to do the required dynamic stack
> 	space size calculations.
> 	(allocate_dynamic_stack_space): Use new functions.
> 	(align_dynamic_address): Move some code from
> 	allocate_dynamic_stack_space to new function.
> 	* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/s390/warn-dynamicstack-1.c: New test.
> 	* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
> 	stack-layout-dynamic-1.c: New test.
>
>
> 0001-v2-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch
>
>
> From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
> From: Dominik Vogt <vogt@linux.vnet.ibm.com>
> Date: Wed, 25 Nov 2015 09:31:19 +0100
> Subject: [PATCH] Allocate constant size dynamic stack space in the
>  prologue ...
>
> ... and place it in the virtual stack vars area, if the platform supports it.
> On S/390 this saves adjusting the stack pointer twice and forcing the frame
> pointer into existence.  It also removes the warning with -mwarn-dynamicstack
> that is triggered by cfun->calls_alloca == 1.
>
> This fixes a problem with the Linux kernel which aligns the page structure to
> 16 bytes at run time using inefficient code and issuing a bogus warning.

> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	  /* Large alignment is only processed in the last pass.  */
>  	  if (pred)
>  	    continue;
> +
> +	  if (large_allocsize && ! large_allocation_done)
> +	    {
> +	      /* Allocate space the virtual stack vars area in the prologue.
> +	       */
Line wrapping nit here.  Bring "prologue" down to the next line.

I really like this.  I think the big question is how do we test it.  I 
suspect our bootstrap and regression suite probably don't exercise this 
code is any significant way.

Jeff
Jeff Law June 23, 2016, 4:46 a.m. UTC | #4
On 05/06/2016 03:44 AM, Dominik Vogt wrote:
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index 21f21c9..4d48afd 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
> ...
>> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>
>>        /* If there were any, allocate space.  */
>>        if (large_size > 0)
>> -	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
>> -						   large_align, true);
>> +	{
>> +	  large_allocsize = GEN_INT (large_size);
>> +	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
> ...
>
> See below.
>
>> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>  	  /* Large alignment is only processed in the last pass.  */
>>  	  if (pred)
>>  	    continue;
>> +
>> +	  if (large_allocsize && ! large_allocation_done)
>> +	    {
>> +	      /* Allocate space the virtual stack vars area in the prologue.
>> +	       */
>> +	      HOST_WIDE_INT loffset;
>> +
>> +	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
>> +						 PREFERRED_STACK_BOUNDARY);
>
> 1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
> 2) Is this the right place for rounding up, or should
>    it be done above, maybe in get_dynamic_stack_size?
I think PREFERRED_STACK_BOUNDARY is the correct one to use.

I think rounding in either place is fine.  We'd like to avoid multiple 
roundings, but otherwise I don't think it really matters.

jeff
>
> Not sure whether this is the right
>
>> +	      large_base = get_dynamic_stack_base (loffset, large_align);
>> +	      large_allocation_done = true;
>> +	    }
>>  	  gcc_assert (large_base != NULL);
>>
>>  	  large_alloc += alignb - 1;
>
>> diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
>> new file mode 100644
>> index 0000000..e06a16c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
>> @@ -0,0 +1,14 @@
>> +/* Verify that run time aligned local variables are aloocated in the prologue
>> +   in one pass together with normal local variables.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O0" } */
>> +
>> +extern void bar (void *, void *, void *);
>> +void foo (void)
>> +{
>> +  int i;
>> +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
>> +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
>> +  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
>> +}
>> +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
>
> I've no idea how to test this on other targets, or how to express
> the test in a target independent way.  The scan-assembler-times
> does not work on x86_64.
I wonder if you could force -fomit-frame-pointer and see if we still end 
up with a frame pointer?

jeff
diff mbox

Patch

From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
From: Dominik Vogt <vogt@linux.vnet.ibm.com>
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c                                    |  20 +-
 gcc/explow.c                                       | 232 ++++++++++++++-------
 gcc/explow.h                                       |   9 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c      |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c               |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c          |  17 ++
 6 files changed, 212 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21f21c9..4d48afd 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1052,7 +1052,9 @@  expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1099,8 +1101,10 @@  expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
       /* If there were any, allocate space.  */
       if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-						   large_align, true);
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
+	}
     }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,18 @@  expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */
 	  if (pred)
 	    continue;
+
+	  if (large_allocsize && ! large_allocation_done)
+	    {
+	      /* Allocate space the virtual stack vars area in the prologue.
+	       */
+	      HOST_WIDE_INT loffset;
+
+	      loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
+						 PREFERRED_STACK_BOUNDARY);
+	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_allocation_done = true;
+	    }
 	  gcc_assert (large_base != NULL);
 
 	  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 1858597..7d13ed7 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1152,84 +1152,58 @@  record_new_stack_level (void)
     update_sjlj_context ();
 }
 
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+     but we know it can't.  So add ourselves and then do
+     TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
+				       Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
+			  gen_int_mode (required_align / BITS_PER_UNIT,
+					Pmode),
+			  NULL_RTX, 1);
+  target = expand_mult (Pmode, target,
+			gen_int_mode (required_align / BITS_PER_UNIT,
+				      Pmode),
+			NULL_RTX, 1);
 
-   Any required stack pointer alignment is preserved.
+  return target;
+}
 
-   SIZE is an rtx representing the size of the area.
+/* Return an rtx through *PSIZE, representing the size of an area of memory to
+   be dynamically pushed on the stack.  The bool return value of this function
+   indicates whether any alignment has been done.
+
+   *PSIZE is an rtx representing the size of the area.
 
    SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
-   parameter may be zero.  If so, a proper value will be extracted 
+   parameter may be zero.  If so, a proper value will be extracted
    from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
    of memory.
 
-   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
-   stack space allocated by the generated code cannot be added with itself
-   in the course of the execution of the function.  It is always safe to
-   pass FALSE here and the following criterion is sufficient in order to
-   pass TRUE: every path in the CFG that starts at the allocation point and
-   loops to it executes the associated deallocation code.  */
-
-rtx
-allocate_dynamic_stack_space (rtx size, unsigned size_align,
-			      unsigned required_align, bool cannot_accumulate)
+   If PSTACK_USAGE_SIZE is not NULL it points to a value that is increased for
+   the additional size returned.  */
+bool
+get_dynamic_stack_size (rtx *psize, unsigned size_align,
+			unsigned required_align,
+			HOST_WIDE_INT *pstack_usage_size)
 {
-  HOST_WIDE_INT stack_usage_size = -1;
-  rtx_code_label *final_label;
-  rtx final_target, target;
   unsigned extra_align = 0;
   unsigned extra = 0;
   bool must_align;
-
-  /* If we're asking for zero bytes, it doesn't matter what we point
-     to since we can't dereference it.  But return a reasonable
-     address anyway.  */
-  if (size == const0_rtx)
-    return virtual_stack_dynamic_rtx;
-
-  /* Otherwise, show we're calling alloca or equivalent.  */
-  cfun->calls_alloca = 1;
-
-  /* If stack usage info is requested, look into the size we are passed.
-     We need to do so this early to avoid the obfuscation that may be
-     introduced later by the various alignment operations.  */
-  if (flag_stack_usage_info)
-    {
-      if (CONST_INT_P (size))
-	stack_usage_size = INTVAL (size);
-      else if (REG_P (size))
-        {
-	  /* Look into the last emitted insn and see if we can deduce
-	     something for the register.  */
-	  rtx_insn *insn;
-	  rtx set, note;
-	  insn = get_last_insn ();
-	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
-	    {
-	      if (CONST_INT_P (SET_SRC (set)))
-		stack_usage_size = INTVAL (SET_SRC (set));
-	      else if ((note = find_reg_equal_equiv_note (insn))
-		       && CONST_INT_P (XEXP (note, 0)))
-		stack_usage_size = INTVAL (XEXP (note, 0));
-	    }
-	}
-
-      /* If the size is not constant, we can't say anything.  */
-      if (stack_usage_size == -1)
-	{
-	  current_function_has_unbounded_dynamic_stack_size = 1;
-	  stack_usage_size = 0;
-	}
-    }
+  rtx size = *psize;
 
   /* Ensure the size is in the proper mode.  */
   if (GET_MODE (size) != VOIDmode && GET_MODE (size) != Pmode)
     size = convert_to_mode (Pmode, size, 1);
 
-  /* Adjust SIZE_ALIGN, if needed.  */
   if (CONST_INT_P (size))
     {
       unsigned HOST_WIDE_INT lsb;
@@ -1289,8 +1263,8 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
       size = plus_constant (Pmode, size, extra);
       size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage_info)
-	stack_usage_size += extra;
+      if (flag_stack_usage_info && pstack_usage_size)
+	*pstack_usage_size += extra;
 
       if (size_align > extra_align)
 	size_align = extra_align;
@@ -1313,13 +1287,93 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
     {
       size = round_push (size, extra);
 
-      if (flag_stack_usage_info)
+      if (flag_stack_usage_info && pstack_usage_size)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
+	  *pstack_usage_size =
+	    (*pstack_usage_size + align - 1) / align * align;
+	}
+    }
+
+  *psize = size;
+
+  return must_align;
+}
+
+/* Return an rtx representing the address of an area of memory dynamically
+   pushed on the stack.
+
+   Any required stack pointer alignment is preserved.
+
+   SIZE is an rtx representing the size of the area.
+
+   SIZE_ALIGN is the alignment (in bits) that we know SIZE has.  This
+   parameter may be zero.  If so, a proper value will be extracted
+   from SIZE if it is constant, otherwise BITS_PER_UNIT will be assumed.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.
+
+   If CANNOT_ACCUMULATE is set to TRUE, the caller guarantees that the
+   stack space allocated by the generated code cannot be added with itself
+   in the course of the execution of the function.  It is always safe to
+   pass FALSE here and the following criterion is sufficient in order to
+   pass TRUE: every path in the CFG that starts at the allocation point and
+   loops to it executes the associated deallocation code.  */
+
+rtx
+allocate_dynamic_stack_space (rtx size, unsigned size_align,
+			      unsigned required_align, bool cannot_accumulate)
+{
+  HOST_WIDE_INT stack_usage_size = -1;
+  rtx_code_label *final_label;
+  rtx final_target, target;
+  bool must_align;
+
+  /* If we're asking for zero bytes, it doesn't matter what we point
+     to since we can't dereference it.  But return a reasonable
+     address anyway.  */
+  if (size == const0_rtx)
+    return virtual_stack_dynamic_rtx;
+
+  /* Otherwise, show we're calling alloca or equivalent.  */
+  cfun->calls_alloca = 1;
+
+  /* If stack usage info is requested, look into the size we are passed.
+     We need to do so this early to avoid the obfuscation that may be
+     introduced later by the various alignment operations.  */
+  if (flag_stack_usage_info)
+    {
+      if (CONST_INT_P (size))
+	stack_usage_size = INTVAL (size);
+      else if (REG_P (size))
+        {
+	  /* Look into the last emitted insn and see if we can deduce
+	     something for the register.  */
+	  rtx_insn *insn;
+	  rtx set, note;
+	  insn = get_last_insn ();
+	  if ((set = single_set (insn)) && rtx_equal_p (SET_DEST (set), size))
+	    {
+	      if (CONST_INT_P (SET_SRC (set)))
+		stack_usage_size = INTVAL (SET_SRC (set));
+	      else if ((note = find_reg_equal_equiv_note (insn))
+		       && CONST_INT_P (XEXP (note, 0)))
+		stack_usage_size = INTVAL (XEXP (note, 0));
+	    }
+	}
+
+      /* If the size is not constant, we can't say anything.  */
+      if (stack_usage_size == -1)
+	{
+	  current_function_has_unbounded_dynamic_stack_size = 1;
+	  stack_usage_size = 0;
 	}
     }
 
+  must_align = get_dynamic_stack_size (&size, size_align, required_align,
+				       &stack_usage_size);
+
   target = gen_reg_rtx (Pmode);
 
   /* The size is supposed to be fully adjusted at this point so record it
@@ -1486,23 +1540,7 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
     }
 
   if (must_align)
-    {
-      /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
-	 but we know it can't.  So add ourselves and then do
-	 TRUNC_DIV_EXPR.  */
-      target = expand_binop (Pmode, add_optab, target,
-			     gen_int_mode (required_align / BITS_PER_UNIT - 1,
-					   Pmode),
-			     NULL_RTX, 1, OPTAB_LIB_WIDEN);
-      target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
-			      gen_int_mode (required_align / BITS_PER_UNIT,
-					    Pmode),
-			      NULL_RTX, 1);
-      target = expand_mult (Pmode, target,
-			    gen_int_mode (required_align / BITS_PER_UNIT,
-					  Pmode),
-			    NULL_RTX, 1);
-    }
+    target = align_dynamic_address (target, required_align);
 
   /* Now that we've committed to a return value, mark its alignment.  */
   mark_reg_pointer (target, required_align);
@@ -1512,6 +1550,38 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
 
   return target;
 }
+
+/* Return an rtx representing the address of an area of memory already
+   statically pushed onto the stack in the virtual stack vars area.  (It is
+   assumed that the area is allocated in the function prologue.)
+
+   Any required stack pointer alignment is preserved.
+
+   OFFSET is the offset of the area into the virtual stack vars area.
+
+   REQUIRED_ALIGN is the alignment (in bits) required for the region
+   of memory.  */
+
+rtx
+get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+{
+  rtx target;
+
+  if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+    crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  target = gen_reg_rtx (Pmode);
+  emit_move_insn (target, virtual_stack_vars_rtx);
+  target = expand_binop (Pmode, add_optab, target,
+			 gen_int_mode (offset, Pmode),
+			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = align_dynamic_address (target, required_align);
+
+  /* Now that we've committed to a return value, mark its alignment.  */
+  mark_reg_pointer (target, required_align);
+
+  return target;
+}
 
 /* A front end may want to override GCC's stack checking by providing a
    run-time routine to call to check the stack, so provide a mechanism for
diff --git a/gcc/explow.h b/gcc/explow.h
index 52113db..6a89387 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -87,6 +87,15 @@  extern void record_new_stack_level (void);
 /* Allocate some space on the stack dynamically and return its address.  */
 extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned, bool);
 
+/* Calculate the necessary size of a constant dynamic stack allocation from the
+   size of the variable area.  */
+extern bool get_dynamic_stack_size (rtx *, unsigned, unsigned,
+				    HOST_WIDE_INT *);
+
+/* Returns the address of the dynamic stack space without allocating it.  */
+extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
+				   unsigned required_align);
+
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
 
diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 0000000..e06a16c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@ 
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (&i, &runtime_aligned_1, &runtime_aligned_2);
+}
+/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { s390*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/stack-usage-2.c b/gcc/testsuite/gcc.dg/stack-usage-2.c
index 86e2a65..a3dc522 100644
--- a/gcc/testsuite/gcc.dg/stack-usage-2.c
+++ b/gcc/testsuite/gcc.dg/stack-usage-2.c
@@ -16,7 +16,9 @@  int foo2 (void)  /* { dg-warning "stack usage is \[0-9\]* bytes" } */
   return 0;
 }
 
-int foo3 (void) /* { dg-warning "stack usage might be \[0-9\]* bytes" } */
+/* The actual warning depends on whether stack space is allocated dynamically
+   or staically.  */
+int foo3 (void) /* { dg-warning "stack usage (might be)|(is) \[0-9\]* bytes" } */
 {
   char arr[1024] __attribute__((aligned (512)));
   arr[0] = 1;
diff --git a/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
new file mode 100644
index 0000000..66913f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c
@@ -0,0 +1,17 @@ 
+/* Check that the stack pointer is decreased only once in a funtion with
+   runtime aligned stack variables and -mwarn-dynamicstack does not generate a
+   warning.  */
+
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -mwarn-dynamicstack" } */
+
+extern int bar (char *pl);
+
+int foo (long size)
+{
+  char __attribute__ ((aligned(16))) l = size;
+
+  return bar (&l);
+}
+
+/* { dg-final { scan-assembler-times "%r15,-" 1 } } */
-- 
2.3.0