Message ID | 20120424215703.8BAE5C187E@rong.mtv.corp.google.com |
---|---|
State | New |
Headers | show |
On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu <xur@google.com> wrote: > Hi, > > In split_function() (ipa-split.c), for the newly created call stmt, > its block is set to the outermost scope, i.e. > DECL_INITIAL(current_function_decl). When we inline this > partially outlined function, we create the new block based on the > block for the call stmt (in expand_call_inline()). > So the new block for the inlined body is in > parallel to the function top level scope. This bad block structure will > late result in a bad stack layout. We do not depend on the block structure any more when dealing with stack layout for variables in GCC 4.7.0 and above. I am not saying your patch is incorrect or not needed. Just it will not have an effect on variable stack layout. Did you have a testcase for the bad stack layout issue? Or was it too hard to produce one because the size matters? > > This patch fixes the issue by setting the correct block number. It > starts with the split_point entry bb to find the block stmt in the > outlined region. The entry_bb maybe an empty BB so we need to follow > the CFG until we find a non-empty bb. > > My early patch tried to use the block info from the first non-empty > bb in the outlined regision. But that failed bootstrap as some of the > stmts (assignment stmt) do not have the block info (bug?). So in this > patch I traverse all the stmts in the bb until getting the block info. > > Tested with gcc bootstap. On which target? Thanks, Andrew Pinski > > Thanks, > > 2012-04-24 Rong Xu <xur@google.com> > > * ipa-split.c (split_function): set the correct block for the > call statement. > > Index: ipa-split.c > =================================================================== > --- ipa-split.c (revision 186634) > +++ ipa-split.c (working copy) > @@ -948,7 +948,7 @@ > int num = 0; > struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); > basic_block return_bb = find_return_bb (); > - basic_block call_bb; > + basic_block call_bb, bb; > gimple_stmt_iterator gsi; > gimple call; > edge e; > @@ -958,6 +958,7 @@ > gimple last_stmt = NULL; > unsigned int i; > tree arg; > + tree split_loc_block = NULL; > > if (dump_file) > { > @@ -1072,6 +1073,33 @@ > } > } > > + /* Find the block location of the split region. */ > + bb = split_point->entry_bb; > + do > + { > + bool found = false; > + > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + if (is_gimple_debug(gsi_stmt(gsi))) > + continue; > + if ((split_loc_block = gimple_block (gsi_stmt (gsi)))) > + { > + found = true; > + break; > + } > + } > + if (found) > + break; > + > + /* If we reach here, this bb should be an empty unconditional > + or fall-throuth branch. We continue with the succ bb. */ > + bb = single_succ (bb); > + } > + while (bb && bitmap_bit_p (split_point->split_bbs, bb->index)); > + > + gcc_assert (split_loc_block); > + > /* Now create the actual clone. */ > rebuild_cgraph_edges (); > node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, > @@ -1115,7 +1143,7 @@ > VEC_replace (tree, args_to_pass, i, arg); > } > call = gimple_build_call_vec (node->decl, args_to_pass); > - gimple_set_block (call, DECL_INITIAL (current_function_decl)); > + gimple_set_block (call, split_loc_block); > > /* We avoid address being taken on any variable used by split part, > so return slot optimization is always possible. Moreover this is > > -- > This patch is available for review at http://codereview.appspot.com/6111050
please the find test case in the attachment. It shows the issue in google-4_6 branch -c -O2 -fno-strict-aliasing ss.C -fdump-rtl-expand-all in the rtl-expand dump, trianglevertices and one the gtest_ar are in the same partition. the problem is found in arm compiler, but we manager to reproduce in x86. -Rong On Tue, Apr 24, 2012 at 3:02 PM, Andrew Pinski <pinskia@gmail.com> wrote: > On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu <xur@google.com> wrote: >> Hi, >> >> In split_function() (ipa-split.c), for the newly created call stmt, >> its block is set to the outermost scope, i.e. >> DECL_INITIAL(current_function_decl). When we inline this >> partially outlined function, we create the new block based on the >> block for the call stmt (in expand_call_inline()). >> So the new block for the inlined body is in >> parallel to the function top level scope. This bad block structure will >> late result in a bad stack layout. > > We do not depend on the block structure any more when dealing with > stack layout for variables in GCC 4.7.0 and above. I am not saying > your patch is incorrect or not needed. Just it will not have an > effect on variable stack layout. > > Did you have a testcase for the bad stack layout issue? Or was it too > hard to produce one because the size matters? > > >> >> This patch fixes the issue by setting the correct block number. It >> starts with the split_point entry bb to find the block stmt in the >> outlined region. The entry_bb maybe an empty BB so we need to follow >> the CFG until we find a non-empty bb. >> >> My early patch tried to use the block info from the first non-empty >> bb in the outlined regision. But that failed bootstrap as some of the >> stmts (assignment stmt) do not have the block info (bug?). So in this >> patch I traverse all the stmts in the bb until getting the block info. >> >> Tested with gcc bootstap. > > On which target? > > Thanks, > Andrew Pinski > >> >> Thanks, >> >> 2012-04-24 Rong Xu <xur@google.com> >> >> * ipa-split.c (split_function): set the correct block for the >> call statement. >> >> Index: ipa-split.c >> =================================================================== >> --- ipa-split.c (revision 186634) >> +++ ipa-split.c (working copy) >> @@ -948,7 +948,7 @@ >> int num = 0; >> struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); >> basic_block return_bb = find_return_bb (); >> - basic_block call_bb; >> + basic_block call_bb, bb; >> gimple_stmt_iterator gsi; >> gimple call; >> edge e; >> @@ -958,6 +958,7 @@ >> gimple last_stmt = NULL; >> unsigned int i; >> tree arg; >> + tree split_loc_block = NULL; >> >> if (dump_file) >> { >> @@ -1072,6 +1073,33 @@ >> } >> } >> >> + /* Find the block location of the split region. */ >> + bb = split_point->entry_bb; >> + do >> + { >> + bool found = false; >> + >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> + { >> + if (is_gimple_debug(gsi_stmt(gsi))) >> + continue; >> + if ((split_loc_block = gimple_block (gsi_stmt (gsi)))) >> + { >> + found = true; >> + break; >> + } >> + } >> + if (found) >> + break; >> + >> + /* If we reach here, this bb should be an empty unconditional >> + or fall-throuth branch. We continue with the succ bb. */ >> + bb = single_succ (bb); >> + } >> + while (bb && bitmap_bit_p (split_point->split_bbs, bb->index)); >> + >> + gcc_assert (split_loc_block); >> + >> /* Now create the actual clone. */ >> rebuild_cgraph_edges (); >> node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, >> @@ -1115,7 +1143,7 @@ >> VEC_replace (tree, args_to_pass, i, arg); >> } >> call = gimple_build_call_vec (node->decl, args_to_pass); >> - gimple_set_block (call, DECL_INITIAL (current_function_decl)); >> + gimple_set_block (call, split_loc_block); >> >> /* We avoid address being taken on any variable used by split part, >> so return slot optimization is always possible. Moreover this is >> >> -- >> This patch is available for review at http://codereview.appspot.com/6111050
> We do not depend on the block structure any more when dealing with > stack layout for variables in GCC 4.7.0 and above. I am not saying > your patch is incorrect or not needed. Just it will not have an > effect on variable stack layout. It might be worth backporting to the 4.6 branch though, these stack layout issues are very nasty.
On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> We do not depend on the block structure any more when dealing with >> stack layout for variables in GCC 4.7.0 and above. I am not saying >> your patch is incorrect or not needed. Just it will not have an >> effect on variable stack layout. > > It might be worth backporting to the 4.6 branch though, these stack layout > issues are very nasty. What about not setting a BLOCK on the call stmt? That said, I can't see how outlining a SESE region that starts at the beginning of the function is not conservatively using the outermost BLOCK ... so, is the bug not elsewhere? Richard. > -- > Eric Botcazou
> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: > >> We do not depend on the block structure any more when dealing with > >> stack layout for variables in GCC 4.7.0 and above. I am not saying > >> your patch is incorrect or not needed. Just it will not have an > >> effect on variable stack layout. > > > > It might be worth backporting to the 4.6 branch though, these stack layout > > issues are very nasty. > > What about not setting a BLOCK on the call stmt? That said, I can't see I recall that not seetting the block did lead to ICE... > how outlining a SESE region that starts at the beginning of the function > is not conservatively using the outermost BLOCK ... so, is the bug not > elsewhere? ... so I made the exactly same conclussion. The problem is with stack vars allocated in header that survive till the tail part? Honza > > Richard. > > > -- > > Eric Botcazou
On Fri, Apr 27, 2012 at 5:06 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> >> We do not depend on the block structure any more when dealing with >> >> stack layout for variables in GCC 4.7.0 and above. I am not saying >> >> your patch is incorrect or not needed. Just it will not have an >> >> effect on variable stack layout. >> > >> > It might be worth backporting to the 4.6 branch though, these stack layout >> > issues are very nasty. >> >> What about not setting a BLOCK on the call stmt? That said, I can't see > > I recall that not seetting the block did lead to ICE... > >> how outlining a SESE region that starts at the beginning of the function >> is not conservatively using the outermost BLOCK ... so, is the bug not >> elsewhere? > > ... so I made the exactly same conclussion. > The problem is with stack vars allocated in header that survive till the > tail part? A SESE region does not necessary at the beginning of the function. They can be anywhere. In the example I attached, it is at the end of function. Even if the outlined region is at the beginning the function. setting the call_stmt as the outermost block is also incorrect. For c++ programs, the block for function level local variables is not DECL_INITIAL(current_function_decl). It is the subblock of DECL_INITIAL(current_function_decl). This is different from c programs. -Rong > > Honza >> >> Richard. >> >> > -- >> > Eric Botcazou
Index: ipa-split.c =================================================================== --- ipa-split.c (revision 186634) +++ ipa-split.c (working copy) @@ -948,7 +948,7 @@ int num = 0; struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl); basic_block return_bb = find_return_bb (); - basic_block call_bb; + basic_block call_bb, bb; gimple_stmt_iterator gsi; gimple call; edge e; @@ -958,6 +958,7 @@ gimple last_stmt = NULL; unsigned int i; tree arg; + tree split_loc_block = NULL; if (dump_file) { @@ -1072,6 +1073,33 @@ } } + /* Find the block location of the split region. */ + bb = split_point->entry_bb; + do + { + bool found = false; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + if (is_gimple_debug(gsi_stmt(gsi))) + continue; + if ((split_loc_block = gimple_block (gsi_stmt (gsi)))) + { + found = true; + break; + } + } + if (found) + break; + + /* If we reach here, this bb should be an empty unconditional + or fall-throuth branch. We continue with the succ bb. */ + bb = single_succ (bb); + } + while (bb && bitmap_bit_p (split_point->split_bbs, bb->index)); + + gcc_assert (split_loc_block); + /* Now create the actual clone. */ rebuild_cgraph_edges (); node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip, @@ -1115,7 +1143,7 @@ VEC_replace (tree, args_to_pass, i, arg); } call = gimple_build_call_vec (node->decl, args_to_pass); - gimple_set_block (call, DECL_INITIAL (current_function_decl)); + gimple_set_block (call, split_loc_block); /* We avoid address being taken on any variable used by split part, so return slot optimization is always possible. Moreover this is