Patchwork set the correct block info for the call stmt in fnsplit (issue6111050)

login
register
mail settings
Submitter Rong Xu
Date April 24, 2012, 9:57 p.m.
Message ID <20120424215703.8BAE5C187E@rong.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/154765/
State New
Headers show

Comments

Rong Xu - April 24, 2012, 9:57 p.m.
Hi,

In split_function() (ipa-split.c), for the newly created call stmt,
its block is set to the outermost scope, i.e.
DECL_INITIAL(current_function_decl). When we inline this
partially outlined function, we create the new block based on the
block for the call stmt (in expand_call_inline()). 
So the new block for the inlined body is in
parallel to the function top level scope. This bad block structure will
late result in a bad stack layout.

This patch fixes the issue by setting the correct block number. It 
starts with the split_point entry bb to find the block stmt in the 
outlined region. The entry_bb maybe an empty BB so we need to follow
the CFG until we find a non-empty bb.

My early patch tried to use the block info from the first non-empty 
bb in the outlined regision. But that failed bootstrap as some of the
stmts (assignment stmt) do not have the block info (bug?). So in this
patch I traverse all the stmts in the bb until getting the block info.

Tested with gcc bootstap.

Thanks,

2012-04-24   Rong Xu  <xur@google.com>

	* ipa-split.c (split_function): set the correct block for the
        call statement.


--
This patch is available for review at http://codereview.appspot.com/6111050
Andrew Pinski - April 24, 2012, 10:02 p.m.
On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu <xur@google.com> wrote:
> Hi,
>
> In split_function() (ipa-split.c), for the newly created call stmt,
> its block is set to the outermost scope, i.e.
> DECL_INITIAL(current_function_decl). When we inline this
> partially outlined function, we create the new block based on the
> block for the call stmt (in expand_call_inline()).
> So the new block for the inlined body is in
> parallel to the function top level scope. This bad block structure will
> late result in a bad stack layout.

We do not depend on the block structure any more when dealing with
stack layout for variables in GCC 4.7.0 and above.  I am not saying
your patch is incorrect or not needed.  Just it will not have an
effect on variable stack layout.

Did you have a testcase for the bad stack layout issue?  Or was it too
hard to produce one because the size matters?


>
> This patch fixes the issue by setting the correct block number. It
> starts with the split_point entry bb to find the block stmt in the
> outlined region. The entry_bb maybe an empty BB so we need to follow
> the CFG until we find a non-empty bb.
>
> My early patch tried to use the block info from the first non-empty
> bb in the outlined regision. But that failed bootstrap as some of the
> stmts (assignment stmt) do not have the block info (bug?). So in this
> patch I traverse all the stmts in the bb until getting the block info.
>
> Tested with gcc bootstap.

On which target?

Thanks,
Andrew Pinski

>
> Thanks,
>
> 2012-04-24   Rong Xu  <xur@google.com>
>
>        * ipa-split.c (split_function): set the correct block for the
>        call statement.
>
> Index: ipa-split.c
> ===================================================================
> --- ipa-split.c (revision 186634)
> +++ ipa-split.c (working copy)
> @@ -948,7 +948,7 @@
>   int num = 0;
>   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
>   basic_block return_bb = find_return_bb ();
> -  basic_block call_bb;
> +  basic_block call_bb, bb;
>   gimple_stmt_iterator gsi;
>   gimple call;
>   edge e;
> @@ -958,6 +958,7 @@
>   gimple last_stmt = NULL;
>   unsigned int i;
>   tree arg;
> +  tree split_loc_block = NULL;
>
>   if (dump_file)
>     {
> @@ -1072,6 +1073,33 @@
>        }
>     }
>
> +  /* Find the block location of the split region.  */
> +  bb = split_point->entry_bb;
> +  do
> +    {
> +      bool found = false;
> +
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +        {
> +          if (is_gimple_debug(gsi_stmt(gsi)))
> +            continue;
> +          if ((split_loc_block = gimple_block (gsi_stmt (gsi))))
> +            {
> +              found = true;
> +              break;
> +            }
> +        }
> +      if (found)
> +        break;
> +
> +      /* If we reach here, this bb should be an empty unconditional
> +         or fall-throuth branch. We continue with the succ bb.  */
> +      bb = single_succ (bb);
> +    }
> +  while (bb && bitmap_bit_p (split_point->split_bbs, bb->index));
> +
> +  gcc_assert (split_loc_block);
> +
>   /* Now create the actual clone.  */
>   rebuild_cgraph_edges ();
>   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
> @@ -1115,7 +1143,7 @@
>        VEC_replace (tree, args_to_pass, i, arg);
>       }
>   call = gimple_build_call_vec (node->decl, args_to_pass);
> -  gimple_set_block (call, DECL_INITIAL (current_function_decl));
> +  gimple_set_block (call, split_loc_block);
>
>   /* We avoid address being taken on any variable used by split part,
>      so return slot optimization is always possible.  Moreover this is
>
> --
> This patch is available for review at http://codereview.appspot.com/6111050
Rong Xu - April 24, 2012, 10:05 p.m.
please the find test case in the attachment. It shows the issue in
google-4_6 branch

-c -O2 -fno-strict-aliasing ss.C -fdump-rtl-expand-all

in the rtl-expand dump,  trianglevertices and one the gtest_ar are in
the same partition.

the problem is found in arm compiler, but we manager to reproduce in x86.

-Rong


On Tue, Apr 24, 2012 at 3:02 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu <xur@google.com> wrote:
>> Hi,
>>
>> In split_function() (ipa-split.c), for the newly created call stmt,
>> its block is set to the outermost scope, i.e.
>> DECL_INITIAL(current_function_decl). When we inline this
>> partially outlined function, we create the new block based on the
>> block for the call stmt (in expand_call_inline()).
>> So the new block for the inlined body is in
>> parallel to the function top level scope. This bad block structure will
>> late result in a bad stack layout.
>
> We do not depend on the block structure any more when dealing with
> stack layout for variables in GCC 4.7.0 and above.  I am not saying
> your patch is incorrect or not needed.  Just it will not have an
> effect on variable stack layout.
>
> Did you have a testcase for the bad stack layout issue?  Or was it too
> hard to produce one because the size matters?
>
>
>>
>> This patch fixes the issue by setting the correct block number. It
>> starts with the split_point entry bb to find the block stmt in the
>> outlined region. The entry_bb maybe an empty BB so we need to follow
>> the CFG until we find a non-empty bb.
>>
>> My early patch tried to use the block info from the first non-empty
>> bb in the outlined regision. But that failed bootstrap as some of the
>> stmts (assignment stmt) do not have the block info (bug?). So in this
>> patch I traverse all the stmts in the bb until getting the block info.
>>
>> Tested with gcc bootstap.
>
> On which target?
>
> Thanks,
> Andrew Pinski
>
>>
>> Thanks,
>>
>> 2012-04-24   Rong Xu  <xur@google.com>
>>
>>        * ipa-split.c (split_function): set the correct block for the
>>        call statement.
>>
>> Index: ipa-split.c
>> ===================================================================
>> --- ipa-split.c (revision 186634)
>> +++ ipa-split.c (working copy)
>> @@ -948,7 +948,7 @@
>>   int num = 0;
>>   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
>>   basic_block return_bb = find_return_bb ();
>> -  basic_block call_bb;
>> +  basic_block call_bb, bb;
>>   gimple_stmt_iterator gsi;
>>   gimple call;
>>   edge e;
>> @@ -958,6 +958,7 @@
>>   gimple last_stmt = NULL;
>>   unsigned int i;
>>   tree arg;
>> +  tree split_loc_block = NULL;
>>
>>   if (dump_file)
>>     {
>> @@ -1072,6 +1073,33 @@
>>        }
>>     }
>>
>> +  /* Find the block location of the split region.  */
>> +  bb = split_point->entry_bb;
>> +  do
>> +    {
>> +      bool found = false;
>> +
>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +        {
>> +          if (is_gimple_debug(gsi_stmt(gsi)))
>> +            continue;
>> +          if ((split_loc_block = gimple_block (gsi_stmt (gsi))))
>> +            {
>> +              found = true;
>> +              break;
>> +            }
>> +        }
>> +      if (found)
>> +        break;
>> +
>> +      /* If we reach here, this bb should be an empty unconditional
>> +         or fall-throuth branch. We continue with the succ bb.  */
>> +      bb = single_succ (bb);
>> +    }
>> +  while (bb && bitmap_bit_p (split_point->split_bbs, bb->index));
>> +
>> +  gcc_assert (split_loc_block);
>> +
>>   /* Now create the actual clone.  */
>>   rebuild_cgraph_edges ();
>>   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
>> @@ -1115,7 +1143,7 @@
>>        VEC_replace (tree, args_to_pass, i, arg);
>>       }
>>   call = gimple_build_call_vec (node->decl, args_to_pass);
>> -  gimple_set_block (call, DECL_INITIAL (current_function_decl));
>> +  gimple_set_block (call, split_loc_block);
>>
>>   /* We avoid address being taken on any variable used by split part,
>>      so return slot optimization is always possible.  Moreover this is
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/6111050
Eric Botcazou - April 27, 2012, 10:50 a.m.
> We do not depend on the block structure any more when dealing with
> stack layout for variables in GCC 4.7.0 and above.  I am not saying
> your patch is incorrect or not needed.  Just it will not have an
> effect on variable stack layout.

It might be worth backporting to the 4.6 branch though, these stack layout 
issues are very nasty.
Richard Guenther - April 27, 2012, 11:04 a.m.
On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> We do not depend on the block structure any more when dealing with
>> stack layout for variables in GCC 4.7.0 and above.  I am not saying
>> your patch is incorrect or not needed.  Just it will not have an
>> effect on variable stack layout.
>
> It might be worth backporting to the 4.6 branch though, these stack layout
> issues are very nasty.

What about not setting a BLOCK on the call stmt?  That said, I can't see
how outlining a SESE region that starts at the beginning of the function
is not conservatively using the outermost BLOCK ... so, is the bug not
elsewhere?

Richard.

> --
> Eric Botcazou
Jan Hubicka - April 27, 2012, 12:06 p.m.
> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> >> We do not depend on the block structure any more when dealing with
> >> stack layout for variables in GCC 4.7.0 and above.  I am not saying
> >> your patch is incorrect or not needed.  Just it will not have an
> >> effect on variable stack layout.
> >
> > It might be worth backporting to the 4.6 branch though, these stack layout
> > issues are very nasty.
> 
> What about not setting a BLOCK on the call stmt?  That said, I can't see

I recall that not seetting the block did lead to ICE...

> how outlining a SESE region that starts at the beginning of the function
> is not conservatively using the outermost BLOCK ... so, is the bug not
> elsewhere?

... so I made the exactly same conclussion.
The problem is with stack vars allocated in header that survive till the
tail part?

Honza
> 
> Richard.
> 
> > --
> > Eric Botcazou
Rong Xu - April 27, 2012, 5:30 p.m.
On Fri, Apr 27, 2012 at 5:06 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> >> We do not depend on the block structure any more when dealing with
>> >> stack layout for variables in GCC 4.7.0 and above.  I am not saying
>> >> your patch is incorrect or not needed.  Just it will not have an
>> >> effect on variable stack layout.
>> >
>> > It might be worth backporting to the 4.6 branch though, these stack layout
>> > issues are very nasty.
>>
>> What about not setting a BLOCK on the call stmt?  That said, I can't see
>
> I recall that not seetting the block did lead to ICE...
>
>> how outlining a SESE region that starts at the beginning of the function
>> is not conservatively using the outermost BLOCK ... so, is the bug not
>> elsewhere?
>
> ... so I made the exactly same conclussion.
> The problem is with stack vars allocated in header that survive till the
> tail part?

A SESE region does not necessary at the beginning of the function.
They can be anywhere.
In the example I attached, it is at the end of function.

Even if the outlined region is at the beginning the function. setting
the call_stmt as the outermost
block is also incorrect.
For c++ programs, the block for function level local variables is not
DECL_INITIAL(current_function_decl).
It is the subblock of DECL_INITIAL(current_function_decl).
This is different from c programs.

-Rong

>
> Honza
>>
>> Richard.
>>
>> > --
>> > Eric Botcazou

Patch

Index: ipa-split.c
===================================================================
--- ipa-split.c	(revision 186634)
+++ ipa-split.c	(working copy)
@@ -948,7 +948,7 @@ 
   int num = 0;
   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
   basic_block return_bb = find_return_bb ();
-  basic_block call_bb;
+  basic_block call_bb, bb;
   gimple_stmt_iterator gsi;
   gimple call;
   edge e;
@@ -958,6 +958,7 @@ 
   gimple last_stmt = NULL;
   unsigned int i;
   tree arg;
+  tree split_loc_block = NULL;
 
   if (dump_file)
     {
@@ -1072,6 +1073,33 @@ 
 	}
     }
 
+  /* Find the block location of the split region.  */
+  bb = split_point->entry_bb;
+  do
+    {
+      bool found = false;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+        {
+          if (is_gimple_debug(gsi_stmt(gsi)))
+            continue;
+          if ((split_loc_block = gimple_block (gsi_stmt (gsi))))
+            {
+              found = true;
+              break;
+            }
+        }
+      if (found)
+        break;
+
+      /* If we reach here, this bb should be an empty unconditional
+         or fall-throuth branch. We continue with the succ bb.  */
+      bb = single_succ (bb);
+    }
+  while (bb && bitmap_bit_p (split_point->split_bbs, bb->index));
+
+  gcc_assert (split_loc_block);
+
   /* Now create the actual clone.  */
   rebuild_cgraph_edges ();
   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
@@ -1115,7 +1143,7 @@ 
 	VEC_replace (tree, args_to_pass, i, arg);
       }
   call = gimple_build_call_vec (node->decl, args_to_pass);
-  gimple_set_block (call, DECL_INITIAL (current_function_decl));
+  gimple_set_block (call, split_loc_block);
 
   /* We avoid address being taken on any variable used by split part,
      so return slot optimization is always possible.  Moreover this is