Message ID | 20190612072359.GR19695@tucnak |
---|---|
State | New |
Headers | show |
Series | Don't adjust DECL_ALIGN of variables during estimated_stack_frame_size (PR target/90811) | expand |
On Wed, 12 Jun 2019, Jakub Jelinek wrote: > Hi! > > Apparently estimated_stack_frame_size -> expand_one_var -> add_stack_var > -> align_local_variable changes DECL_ALIGN of stack variables from > LOCAL_DECL_ALIGNMENT. > > That is unfortunately very undesirable for offloading, because this happens > early during IPA where we stream out LTO bytecode for the target offloading > after it, so on the PR90811 testcase x86_64 backend says a local variable > would be best 128-bit aligned, but in the PTX backend such alignment means > runtime (virtual) stack adjustments because only 64-bit alignment is > guaranteed. > > The following patch makes sure we don't update DECL_ALIGN during stack frame > size estimation, just during actual expansion. > > Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested > with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix > reverted to verify the overaligned variables are gone. Ok for trunk? Isn't there hunk missing that actually passes false? > Or, is there something in GIMPLE passes that would benefit from the updated > DECL_ALIGN during post-IPA optimizations, rather than just in RTL passes? > If yes, we should call those somewhere post-IPA on top of this patch. > Nothing in the testsuite showed it though. I guess only CCPs bit-value/alignment tracking sees the difference. Note we are already aligning variables in stor-layout.c with target information so in some way this isn't a real fix. Offloading as implemented right now really leaks target dependent decisions from the target to the offload target. Not opposed to the change though a comment in the align_local_variable hunk as to why we only adjust DECL_ALIGN late would be appreciated. Richard. > 2019-06-12 Jakub Jelinek <jakub@redhat.com> > > PR target/90811 > * cfgexpand.c (align_local_variable): Add really_expand argument, > don't SET_DECL_ALIGN if it is false. > (add_stack_var): Add really_expand argument, pass it through to > align_local_variable. > (expand_one_stack_var_1): Pass true as really_expand to > align_local_variable. > (expand_one_ssa_partition): Pass true as really_expand to > add_stack_var. > (expand_one_var): Pass really_expand through to add_stack_var. > > --- gcc/cfgexpand.c.jj 2019-05-20 11:39:34.059816432 +0200 > +++ gcc/cfgexpand.c 2019-06-11 11:28:08.720932421 +0200 > @@ -361,7 +361,7 @@ static bool has_short_buffer; > we can't do with expected alignment of the stack boundary. */ > > static unsigned int > -align_local_variable (tree decl) > +align_local_variable (tree decl, bool really_expand) > { > unsigned int align; > > @@ -370,7 +370,8 @@ align_local_variable (tree decl) > else > { > align = LOCAL_DECL_ALIGNMENT (decl); > - SET_DECL_ALIGN (decl, align); > + if (really_expand) > + SET_DECL_ALIGN (decl, align); > } > return align / BITS_PER_UNIT; > } > @@ -418,7 +419,7 @@ alloc_stack_frame_space (poly_int64 size > /* Accumulate DECL into STACK_VARS. */ > > static void > -add_stack_var (tree decl) > +add_stack_var (tree decl, bool really_expand) > { > struct stack_var *v; > > @@ -446,7 +447,7 @@ add_stack_var (tree decl) > variables that are simultaneously live. */ > if (known_eq (v->size, 0U)) > v->size = 1; > - v->alignb = align_local_variable (decl); > + v->alignb = align_local_variable (decl, really_expand); > /* An alignment of zero can mightily confuse us later. */ > gcc_assert (v->alignb != 0); > > @@ -1323,7 +1324,7 @@ expand_one_stack_var_1 (tree var) > else > { > size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var)); > - byte_align = align_local_variable (var); > + byte_align = align_local_variable (var, true); > } > > /* We handle highly aligned variables in expand_stack_vars. */ > @@ -1413,7 +1414,7 @@ expand_one_ssa_partition (tree var) > if (!use_register_for_decl (var)) > { > if (defer_stack_allocation (var, true)) > - add_stack_var (var); > + add_stack_var (var, true); > else > expand_one_stack_var_1 (var); > return; > @@ -1695,7 +1696,7 @@ expand_one_var (tree var, bool toplevel, > } > } > else if (defer_stack_allocation (var, toplevel)) > - add_stack_var (origvar); > + add_stack_var (origvar, really_expand); > else > { > if (really_expand) > > Jakub >
On Wed, Jun 12, 2019 at 11:00:43AM +0200, Richard Biener wrote: > > Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested > > with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix > > reverted to verify the overaligned variables are gone. Ok for trunk? > > Isn't there hunk missing that actually passes false? It is that + add_stack_var (origvar, really_expand); expand_one_var is called 3 times: expand_one_var (t, toplevel, true); size += expand_one_var (var, true, false); expand_one_var (var, true, true); where the second one is the estimated_stack_frame_size call where it passes down false as really_expand. And the patch passes it down through to align_local_variable. > I guess only CCPs bit-value/alignment tracking sees the difference. > > Note we are already aligning variables in stor-layout.c with target > information so in some way this isn't a real fix. Offloading as > implemented right now really leaks target dependent decisions from > the target to the offload target. I was thinking about only honoring DECL_ALIGN for non-FIELD_DECLs when streaming in ACCEL_COMPILER only if DECL_USER_ALIGN, the risk is if we bump DECL_ALIGN somewhere pre-IPA on the host side and some optimization actually optimizes code using that DECL_ALIGN, later on we stream offloading out and in, downgrade DECL_ALIGN and suddenly the optimization was invalid. > Not opposed to the change though a comment in the align_local_variable > hunk as to why we only adjust DECL_ALIGN late would be appreciated. Sure, that can be done: > > --- gcc/cfgexpand.c.jj 2019-05-20 11:39:34.059816432 +0200 > > +++ gcc/cfgexpand.c 2019-06-11 11:28:08.720932421 +0200 > > @@ -361,7 +361,7 @@ static bool has_short_buffer; > > we can't do with expected alignment of the stack boundary. */ > > > > static unsigned int > > -align_local_variable (tree decl) > > +align_local_variable (tree decl, bool really_expand) > > { > > unsigned int align; > > > > @@ -370,7 +370,8 @@ align_local_variable (tree decl) > > else > > { > > align = LOCAL_DECL_ALIGNMENT (decl); > > - SET_DECL_ALIGN (decl, align); + /* Don't change DECL_ALIGN when called from estimated_stack_frame_size. + That is done before IPA and could bump alignment based on host + backend even for offloaded code which wants different + LOCAL_DECL_ALIGNMENT. */ > > + if (really_expand) > > + SET_DECL_ALIGN (decl, align); > > } > > return align / BITS_PER_UNIT; > > } Jakub
On Wed, 12 Jun 2019, Jakub Jelinek wrote: > On Wed, Jun 12, 2019 at 11:00:43AM +0200, Richard Biener wrote: > > > Bootstrapped/regtested on x86_64-linux and i686-linux and furthermore tested > > > with x86_64-linux to nvptx-none offloading with the nvptx.c PR90811 fix > > > reverted to verify the overaligned variables are gone. Ok for trunk? > > > > Isn't there hunk missing that actually passes false? > > It is that > + add_stack_var (origvar, really_expand); > expand_one_var is called 3 times: > expand_one_var (t, toplevel, true); > size += expand_one_var (var, true, false); > expand_one_var (var, true, true); > where the second one is the estimated_stack_frame_size call where it passes > down false as really_expand. And the patch passes it down through to > align_local_variable. Ah, OK. > > I guess only CCPs bit-value/alignment tracking sees the difference. > > > > Note we are already aligning variables in stor-layout.c with target > > information so in some way this isn't a real fix. Offloading as > > implemented right now really leaks target dependent decisions from > > the target to the offload target. > > I was thinking about only honoring DECL_ALIGN for non-FIELD_DECLs when > streaming in ACCEL_COMPILER only if DECL_USER_ALIGN, the risk is if > we bump DECL_ALIGN somewhere pre-IPA on the host side and some optimization > actually optimizes code using that DECL_ALIGN, later on we stream offloading > out and in, downgrade DECL_ALIGN and suddenly the optimization was invalid. Yeah, I think that's too dangerous. We could stream the offload part before any early optimization. Or mark offload functions and skip the early pipeline for them, also not do IPA on them. But I guess optimizing, esp. inlining into offloaded functions is important. > > Not opposed to the change though a comment in the align_local_variable > > hunk as to why we only adjust DECL_ALIGN late would be appreciated. > > Sure, that can be done: > > > > --- gcc/cfgexpand.c.jj 2019-05-20 11:39:34.059816432 +0200 > > > +++ gcc/cfgexpand.c 2019-06-11 11:28:08.720932421 +0200 > > > @@ -361,7 +361,7 @@ static bool has_short_buffer; > > > we can't do with expected alignment of the stack boundary. */ > > > > > > static unsigned int > > > -align_local_variable (tree decl) > > > +align_local_variable (tree decl, bool really_expand) > > > { > > > unsigned int align; > > > > > > @@ -370,7 +370,8 @@ align_local_variable (tree decl) > > > else > > > { > > > align = LOCAL_DECL_ALIGNMENT (decl); > > > - SET_DECL_ALIGN (decl, align); > > + /* Don't change DECL_ALIGN when called from estimated_stack_frame_size. > + That is done before IPA and could bump alignment based on host > + backend even for offloaded code which wants different > + LOCAL_DECL_ALIGNMENT. */ > > > > + if (really_expand) > > > + SET_DECL_ALIGN (decl, align); Looks good to me with this change. Thanks, Richard.
--- gcc/cfgexpand.c.jj 2019-05-20 11:39:34.059816432 +0200 +++ gcc/cfgexpand.c 2019-06-11 11:28:08.720932421 +0200 @@ -361,7 +361,7 @@ static bool has_short_buffer; we can't do with expected alignment of the stack boundary. */ static unsigned int -align_local_variable (tree decl) +align_local_variable (tree decl, bool really_expand) { unsigned int align; @@ -370,7 +370,8 @@ align_local_variable (tree decl) else { align = LOCAL_DECL_ALIGNMENT (decl); - SET_DECL_ALIGN (decl, align); + if (really_expand) + SET_DECL_ALIGN (decl, align); } return align / BITS_PER_UNIT; } @@ -418,7 +419,7 @@ alloc_stack_frame_space (poly_int64 size /* Accumulate DECL into STACK_VARS. */ static void -add_stack_var (tree decl) +add_stack_var (tree decl, bool really_expand) { struct stack_var *v; @@ -446,7 +447,7 @@ add_stack_var (tree decl) variables that are simultaneously live. */ if (known_eq (v->size, 0U)) v->size = 1; - v->alignb = align_local_variable (decl); + v->alignb = align_local_variable (decl, really_expand); /* An alignment of zero can mightily confuse us later. */ gcc_assert (v->alignb != 0); @@ -1323,7 +1324,7 @@ expand_one_stack_var_1 (tree var) else { size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var)); - byte_align = align_local_variable (var); + byte_align = align_local_variable (var, true); } /* We handle highly aligned variables in expand_stack_vars. */ @@ -1413,7 +1414,7 @@ expand_one_ssa_partition (tree var) if (!use_register_for_decl (var)) { if (defer_stack_allocation (var, true)) - add_stack_var (var); + add_stack_var (var, true); else expand_one_stack_var_1 (var); return; @@ -1695,7 +1696,7 @@ expand_one_var (tree var, bool toplevel, } } else if (defer_stack_allocation (var, toplevel)) - add_stack_var (origvar); + add_stack_var (origvar, really_expand); else { if (really_expand)