diff mbox series

Don't adjust DECL_ALIGN of variables during estimated_stack_frame_size (PR target/90811)

Message ID 20190612072359.GR19695@tucnak
State New
Headers show
Series Don't adjust DECL_ALIGN of variables during estimated_stack_frame_size (PR target/90811) | expand

Commit Message

Jakub Jelinek June 12, 2019, 7:23 a.m. UTC
Hi!

Apparently estimated_stack_frame_size -> expand_one_var -> add_stack_var
-> align_local_variable changes DECL_ALIGN of stack variables from
LOCAL_DECL_ALIGNMENT.

That is unfortunately very undesirable for offloading, because this happens
early during IPA where we stream out LTO bytecode for the target offloading
after it, so on the PR90811 testcase x86_64 backend says a local variable
would be best 128-bit aligned, but in the PTX backend such alignment means
runtime (virtual) stack adjustments because only 64-bit alignment is
guaranteed.

The following patch makes sure we don't update DECL_ALIGN during stack frame
size estimation, just during actual expansion.

Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested
with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix
reverted to verify the overaligned variables are gone.  Ok for trunk?

Or, is there something in GIMPLE passes that would benefit from the updated
DECL_ALIGN during post-IPA optimizations, rather than just in RTL passes?
If yes, we should call those somewhere post-IPA on top of this patch.
Nothing in the testsuite showed it though.

2019-06-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/90811
	* cfgexpand.c (align_local_variable): Add really_expand argument,
	don't SET_DECL_ALIGN if it is false.
	(add_stack_var): Add really_expand argument, pass it through to
	align_local_variable.
	(expand_one_stack_var_1): Pass true as really_expand to
	align_local_variable.
	(expand_one_ssa_partition): Pass true as really_expand to
	add_stack_var.
	(expand_one_var): Pass really_expand through to add_stack_var.


	Jakub

Comments

Richard Biener June 12, 2019, 9 a.m. UTC | #1
On Wed, 12 Jun 2019, Jakub Jelinek wrote:

> Hi!
> 
> Apparently estimated_stack_frame_size -> expand_one_var -> add_stack_var
> -> align_local_variable changes DECL_ALIGN of stack variables from
> LOCAL_DECL_ALIGNMENT.
> 
> That is unfortunately very undesirable for offloading, because this happens
> early during IPA where we stream out LTO bytecode for the target offloading
> after it, so on the PR90811 testcase x86_64 backend says a local variable
> would be best 128-bit aligned, but in the PTX backend such alignment means
> runtime (virtual) stack adjustments because only 64-bit alignment is
> guaranteed.
> 
> The following patch makes sure we don't update DECL_ALIGN during stack frame
> size estimation, just during actual expansion.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested
> with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix
> reverted to verify the overaligned variables are gone.  Ok for trunk?

Isn't there hunk missing that actually passes false?

> Or, is there something in GIMPLE passes that would benefit from the updated
> DECL_ALIGN during post-IPA optimizations, rather than just in RTL passes?
> If yes, we should call those somewhere post-IPA on top of this patch.
> Nothing in the testsuite showed it though.

I guess only CCPs bit-value/alignment tracking sees the difference.

Note we are already aligning variables in stor-layout.c with target
information so in some way this isn't a real fix.  Offloading as
implemented right now really leaks target dependent decisions from
the target to the offload target.

Not opposed to the change though a comment in the align_local_variable
hunk as to why we only adjust DECL_ALIGN late would be appreciated.

Richard.

> 2019-06-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/90811
> 	* cfgexpand.c (align_local_variable): Add really_expand argument,
> 	don't SET_DECL_ALIGN if it is false.
> 	(add_stack_var): Add really_expand argument, pass it through to
> 	align_local_variable.
> 	(expand_one_stack_var_1): Pass true as really_expand to
> 	align_local_variable.
> 	(expand_one_ssa_partition): Pass true as really_expand to
> 	add_stack_var.
> 	(expand_one_var): Pass really_expand through to add_stack_var.
> 
> --- gcc/cfgexpand.c.jj	2019-05-20 11:39:34.059816432 +0200
> +++ gcc/cfgexpand.c	2019-06-11 11:28:08.720932421 +0200
> @@ -361,7 +361,7 @@ static bool has_short_buffer;
>     we can't do with expected alignment of the stack boundary.  */
>  
>  static unsigned int
> -align_local_variable (tree decl)
> +align_local_variable (tree decl, bool really_expand)
>  {
>    unsigned int align;
>  
> @@ -370,7 +370,8 @@ align_local_variable (tree decl)
>    else
>      {
>        align = LOCAL_DECL_ALIGNMENT (decl);
> -      SET_DECL_ALIGN (decl, align);
> +      if (really_expand)
> +	SET_DECL_ALIGN (decl, align);
>      }
>    return align / BITS_PER_UNIT;
>  }
> @@ -418,7 +419,7 @@ alloc_stack_frame_space (poly_int64 size
>  /* Accumulate DECL into STACK_VARS.  */
>  
>  static void
> -add_stack_var (tree decl)
> +add_stack_var (tree decl, bool really_expand)
>  {
>    struct stack_var *v;
>  
> @@ -446,7 +447,7 @@ add_stack_var (tree decl)
>       variables that are simultaneously live.  */
>    if (known_eq (v->size, 0U))
>      v->size = 1;
> -  v->alignb = align_local_variable (decl);
> +  v->alignb = align_local_variable (decl, really_expand);
>    /* An alignment of zero can mightily confuse us later.  */
>    gcc_assert (v->alignb != 0);
>  
> @@ -1323,7 +1324,7 @@ expand_one_stack_var_1 (tree var)
>    else
>      {
>        size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var));
> -      byte_align = align_local_variable (var);
> +      byte_align = align_local_variable (var, true);
>      }
>  
>    /* We handle highly aligned variables in expand_stack_vars.  */
> @@ -1413,7 +1414,7 @@ expand_one_ssa_partition (tree var)
>    if (!use_register_for_decl (var))
>      {
>        if (defer_stack_allocation (var, true))
> -	add_stack_var (var);
> +	add_stack_var (var, true);
>        else
>  	expand_one_stack_var_1 (var);
>        return;
> @@ -1695,7 +1696,7 @@ expand_one_var (tree var, bool toplevel,
>  	}
>      }
>    else if (defer_stack_allocation (var, toplevel))
> -    add_stack_var (origvar);
> +    add_stack_var (origvar, really_expand);
>    else
>      {
>        if (really_expand)
> 
> 	Jakub
>
Jakub Jelinek June 12, 2019, 9:14 a.m. UTC | #2
On Wed, Jun 12, 2019 at 11:00:43AM +0200, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested
> > with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix
> > reverted to verify the overaligned variables are gone.  Ok for trunk?
> 
> Isn't there hunk missing that actually passes false?

It is that
+    add_stack_var (origvar, really_expand);
expand_one_var is called 3 times:
      expand_one_var (t, toplevel, true);
      size += expand_one_var (var, true, false);
        expand_one_var (var, true, true);
where the second one is the estimated_stack_frame_size call where it passes
down false as really_expand.  And the patch passes it down through to
align_local_variable.

> I guess only CCPs bit-value/alignment tracking sees the difference.
> 
> Note we are already aligning variables in stor-layout.c with target
> information so in some way this isn't a real fix.  Offloading as
> implemented right now really leaks target dependent decisions from
> the target to the offload target.

I was thinking about only honoring DECL_ALIGN for non-FIELD_DECLs when
streaming in ACCEL_COMPILER only if DECL_USER_ALIGN, the risk is if
we bump DECL_ALIGN somewhere pre-IPA on the host side and some optimization
actually optimizes code using that DECL_ALIGN, later on we stream offloading
out and in, downgrade DECL_ALIGN and suddenly the optimization was invalid.

> Not opposed to the change though a comment in the align_local_variable
> hunk as to why we only adjust DECL_ALIGN late would be appreciated.

Sure, that can be done:

> > --- gcc/cfgexpand.c.jj	2019-05-20 11:39:34.059816432 +0200
> > +++ gcc/cfgexpand.c	2019-06-11 11:28:08.720932421 +0200
> > @@ -361,7 +361,7 @@ static bool has_short_buffer;
> >     we can't do with expected alignment of the stack boundary.  */
> >  
> >  static unsigned int
> > -align_local_variable (tree decl)
> > +align_local_variable (tree decl, bool really_expand)
> >  {
> >    unsigned int align;
> >  
> > @@ -370,7 +370,8 @@ align_local_variable (tree decl)
> >    else
> >      {
> >        align = LOCAL_DECL_ALIGNMENT (decl);
> > -      SET_DECL_ALIGN (decl, align);

+      /* Don't change DECL_ALIGN when called from estimated_stack_frame_size.
+	 That is done before IPA and could bump alignment based on host
+	 backend even for offloaded code which wants different
+	 LOCAL_DECL_ALIGNMENT.  */

> > +      if (really_expand)
> > +	SET_DECL_ALIGN (decl, align);
> >      }
> >    return align / BITS_PER_UNIT;
> >  }

	Jakub
Richard Biener June 12, 2019, 9:21 a.m. UTC | #3
On Wed, 12 Jun 2019, Jakub Jelinek wrote:

> On Wed, Jun 12, 2019 at 11:00:43AM +0200, Richard Biener wrote:
> > > Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested
> > > with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix
> > > reverted to verify the overaligned variables are gone.  Ok for trunk?
> > 
> > Isn't there hunk missing that actually passes false?
> 
> It is that
> +    add_stack_var (origvar, really_expand);
> expand_one_var is called 3 times:
>       expand_one_var (t, toplevel, true);
>       size += expand_one_var (var, true, false);
>         expand_one_var (var, true, true);
> where the second one is the estimated_stack_frame_size call where it passes
> down false as really_expand.  And the patch passes it down through to
> align_local_variable.

Ah, OK.

> > I guess only CCPs bit-value/alignment tracking sees the difference.
> > 
> > Note we are already aligning variables in stor-layout.c with target
> > information so in some way this isn't a real fix.  Offloading as
> > implemented right now really leaks target dependent decisions from
> > the target to the offload target.
> 
> I was thinking about only honoring DECL_ALIGN for non-FIELD_DECLs when
> streaming in ACCEL_COMPILER only if DECL_USER_ALIGN, the risk is if
> we bump DECL_ALIGN somewhere pre-IPA on the host side and some optimization
> actually optimizes code using that DECL_ALIGN, later on we stream offloading
> out and in, downgrade DECL_ALIGN and suddenly the optimization was invalid.

Yeah, I think that's too dangerous.  We could stream the offload part
before any early optimization.  Or mark offload functions and skip
the early pipeline for them, also not do IPA on them.  But I guess
optimizing, esp. inlining into offloaded functions is important.

> > Not opposed to the change though a comment in the align_local_variable
> > hunk as to why we only adjust DECL_ALIGN late would be appreciated.
> 
> Sure, that can be done:
> 
> > > --- gcc/cfgexpand.c.jj	2019-05-20 11:39:34.059816432 +0200
> > > +++ gcc/cfgexpand.c	2019-06-11 11:28:08.720932421 +0200
> > > @@ -361,7 +361,7 @@ static bool has_short_buffer;
> > >     we can't do with expected alignment of the stack boundary.  */
> > >  
> > >  static unsigned int
> > > -align_local_variable (tree decl)
> > > +align_local_variable (tree decl, bool really_expand)
> > >  {
> > >    unsigned int align;
> > >  
> > > @@ -370,7 +370,8 @@ align_local_variable (tree decl)
> > >    else
> > >      {
> > >        align = LOCAL_DECL_ALIGNMENT (decl);
> > > -      SET_DECL_ALIGN (decl, align);
> 
> +      /* Don't change DECL_ALIGN when called from estimated_stack_frame_size.
> +	 That is done before IPA and could bump alignment based on host
> +	 backend even for offloaded code which wants different
> +	 LOCAL_DECL_ALIGNMENT.  */
> 
> > > +      if (really_expand)
> > > +	SET_DECL_ALIGN (decl, align);

Looks good to me with this change.

Thanks,
Richard.
diff mbox series

Patch

--- gcc/cfgexpand.c.jj	2019-05-20 11:39:34.059816432 +0200
+++ gcc/cfgexpand.c	2019-06-11 11:28:08.720932421 +0200
@@ -361,7 +361,7 @@  static bool has_short_buffer;
    we can't do with expected alignment of the stack boundary.  */
 
 static unsigned int
-align_local_variable (tree decl)
+align_local_variable (tree decl, bool really_expand)
 {
   unsigned int align;
 
@@ -370,7 +370,8 @@  align_local_variable (tree decl)
   else
     {
       align = LOCAL_DECL_ALIGNMENT (decl);
-      SET_DECL_ALIGN (decl, align);
+      if (really_expand)
+	SET_DECL_ALIGN (decl, align);
     }
   return align / BITS_PER_UNIT;
 }
@@ -418,7 +419,7 @@  alloc_stack_frame_space (poly_int64 size
 /* Accumulate DECL into STACK_VARS.  */
 
 static void
-add_stack_var (tree decl)
+add_stack_var (tree decl, bool really_expand)
 {
   struct stack_var *v;
 
@@ -446,7 +447,7 @@  add_stack_var (tree decl)
      variables that are simultaneously live.  */
   if (known_eq (v->size, 0U))
     v->size = 1;
-  v->alignb = align_local_variable (decl);
+  v->alignb = align_local_variable (decl, really_expand);
   /* An alignment of zero can mightily confuse us later.  */
   gcc_assert (v->alignb != 0);
 
@@ -1323,7 +1324,7 @@  expand_one_stack_var_1 (tree var)
   else
     {
       size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var));
-      byte_align = align_local_variable (var);
+      byte_align = align_local_variable (var, true);
     }
 
   /* We handle highly aligned variables in expand_stack_vars.  */
@@ -1413,7 +1414,7 @@  expand_one_ssa_partition (tree var)
   if (!use_register_for_decl (var))
     {
       if (defer_stack_allocation (var, true))
-	add_stack_var (var);
+	add_stack_var (var, true);
       else
 	expand_one_stack_var_1 (var);
       return;
@@ -1695,7 +1696,7 @@  expand_one_var (tree var, bool toplevel,
 	}
     }
   else if (defer_stack_allocation (var, toplevel))
-    add_stack_var (origvar);
+    add_stack_var (origvar, really_expand);
   else
     {
       if (really_expand)