diff mbox

Improve stack layout heuristic.

Message ID BANLkTinwjB+1fwrNt_FiqQ-5SA5YvT7MvQ@mail.gmail.com
State New
Headers show

Commit Message

Easwaran Raman April 19, 2011, 12:24 a.m. UTC
On Mon, Apr 18, 2011 at 5:58 AM, Michael Matz <matz@suse.de> wrote:
>
> Hi,
>
> [FWIW I can't approve patches, but some feedback nevertheless]
>
> On Sun, 17 Apr 2011, Easwaran Raman wrote:
>
> >  This patch impoves the heuristic used in assigning stack location to
> > stack variables.  Currently, if there are 3 variables A, B and C with
> > their sizes in increasing order and A and C have a conflict, it will
> > put A and B in a partition and C in a separate partition with a total
> > frame size of sizeof(B) + sizeof(C).  This patch puts B and C in the
> > same partition and A in a separate partition, with a total size of
> > sizeof(A) + sizeof(C).
>
> That's the change in stack_var_cmp, to match the comment, right?  Makes
> sense.
>
> > The other change in this patch removes the field offset in struct
> > stack_var. Right now, the field is always 0 due to the way it is set in
> > partition_stack_vars.
>
> Huh, indeed, starting with the initial version of that code already.  The
> idea clearly was to pack multiple conflicting small objects into the space
> of only one larger non-conflicting one by placing them at different
> offsets, but that never seems to have been implemented and would require
> different tracking of conflicts.  I think removing the offset tracking
> right now is okay, can be reintroduced once somebody gets motivated
> enough.

Yes, the intent seems to be what you describe above. I haven't thought
about it much, but I am not sure how much a non-backtracking version
will buy over the current heuristic.

>
> > -     and merge them into partition A.  */
> > -  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next)
> > -    {
> > -      stack_vars[i].offset += offset;
> > -      stack_vars[i].representative = a;
> > -    }
> > -  stack_vars[last].next = stack_vars[a].next;
> > +   /* Add B to A's partition.  */
> > +  stack_vars[b].next = stack_vars[a].next;
> > +  stack_vars[b].representative = a;
>
> Hmm.  This seems fishy.  You don't update the representatives of the
> members of the partition that B is the leader of.  Additionally you break
> the chain of B's members.  That is only a problem if B actually has more
> than one member.  That might be always the case with your changed sorting
> order, but it's an important invariant, so please assert this in this
> routine.
>
B always has one member is an invariant in my scheme. The object
corresponding to the outer loop index is always the representative. If
B has to have more than one members, it must have been visited in the
outer loop in some earlier iteration. But then, its index in the
stack_vars_sorted must be greater than the current i. I have added an
assertion than stack_vars[b].next == EOC in union_stack_vars which is
true only when B has one member.

>
> > @@ -596,7 +581,7 @@
> >    if (vb->conflicts)
> >      {
> >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> > -     add_stack_var_conflict (a, stack_vars[u].representative);
> > +     add_stack_var_conflict (a, u);
>
> Please don't.  This uselessly bloats the conflict bitmaps.

It is sufficient to add the conflicts of  a variable only when it is
not merged into some group.  I am adding a check to that effect.
I am attaching a new patch which excludes the strict aliasing check
(which I will send separately) and the changes I have mentioned above.
Bootstraps and no regressions in tests.

Thanks,
Easwaran


>
> Ciao,
> Michael.

Comments

Michael Matz April 19, 2011, 12:08 p.m. UTC | #1
Hi,

On Mon, 18 Apr 2011, Easwaran Raman wrote:

> > > @@ -596,7 +581,7 @@
> > >    if (vb->conflicts)
> > >      {
> > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> > > -     add_stack_var_conflict (a, stack_vars[u].representative);
> > > +     add_stack_var_conflict (a, u);
> >
> > Please don't.  This uselessly bloats the conflict bitmaps.
> 
> It is sufficient to add the conflicts of  a variable only when it is
> not merged into some group.

That is correct but is also what the use of stack_vars[u].representative 
achieves alone, ...

> I am adding a check to that effect.

... without any check.

@@ -596,7 +581,8 @@
   if (vb->conflicts)
     {
       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
-       add_stack_var_conflict (a, stack_vars[u].representative);
+        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
+          add_stack_var_conflict (a, u);
       BITMAP_FREE (vb->conflicts);
     }
 }

What's your objective with this change?  I find the original code clearer.


Ciao,
Michael.
Easwaran Raman April 19, 2011, 4:42 p.m. UTC | #2
On Tue, Apr 19, 2011 at 5:08 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 18 Apr 2011, Easwaran Raman wrote:
>
>> > > @@ -596,7 +581,7 @@
>> > >    if (vb->conflicts)
>> > >      {
>> > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
>> > > -     add_stack_var_conflict (a, stack_vars[u].representative);
>> > > +     add_stack_var_conflict (a, u);
>> >
>> > Please don't.  This uselessly bloats the conflict bitmaps.
>>
>> It is sufficient to add the conflicts of  a variable only when it is
>> not merged into some group.
>
> That is correct but is also what the use of stack_vars[u].representative
> achieves alone, ...
>
>> I am adding a check to that effect.
>
> ... without any check.
>
> @@ -596,7 +581,8 @@
>   if (vb->conflicts)
>     {
>       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> -       add_stack_var_conflict (a, stack_vars[u].representative);
> +        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
> +          add_stack_var_conflict (a, u);
>       BITMAP_FREE (vb->conflicts);
>     }
>  }
>
> What's your objective with this change?  I find the original code clearer.

Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many
members of an existing partition C. It is not necessary to add all
those conflicts to 'b' since they will be never considered in the call
to union_stack_vars. I was motivated by your comment on bit-vector
bloat to try this, but if this affects readability I'll happily revert
back to what it was before.

-Easwaran

>
>
> Ciao,
> Michael.
Michael Matz April 20, 2011, 1:53 p.m. UTC | #3
Hi,

On Tue, 19 Apr 2011, Easwaran Raman wrote:

> > That is correct but is also what the use of stack_vars[u].representative
> > achieves alone, ...
> >
> >> I am adding a check to that effect.
> >
> > ... without any check.
> >
> > @@ -596,7 +581,8 @@
> >   if (vb->conflicts)
> >     {
> >       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
> > -       add_stack_var_conflict (a, stack_vars[u].representative);
> > +        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
> > +          add_stack_var_conflict (a, u);
> >       BITMAP_FREE (vb->conflicts);
> >     }
> >  }
> >
> > What's your objective with this change?  I find the original code clearer.
> 
> Let us say we try to merge 'a' to 'b' and 'a' has conflicts with many
> members of an existing partition C. It is not necessary to add all
> those conflicts to 'b' since they will be never considered in the call
> to union_stack_vars.

Right, that's why I was objecting to your initial change.  The original 
code (adding stack_vars[u].representative to the conflicts of A) made sure 
the target conflict bitmap only got representatives added.  That's why I 
was asking why you changed this area at all.

> I was motivated by your comment on bit-vector bloat to try this, but if 
> this affects readability I'll happily revert back to what it was before.


Ciao,
Michael.
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===================================================================
--- gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+    char a[8000];
+    bar(a);
+    i += a[0];
+  }
+  {
+    char a[8192];
+    char b[32];
+    bar(a);
+    i += a[0];
+    bar(b);
+    i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */
+/* { dg-final { scan-rtl-dump "size 32" "expand" } } */
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	(revision 171954)
+++ gcc/cfgexpand.c	(working copy)
@@ -158,11 +158,6 @@ 
   /* The Variable.  */
   tree decl;
 
-  /* The offset of the variable.  During partitioning, this is the
-     offset relative to the partition.  After partitioning, this
-     is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
      if this variable becomes it's partition's representative.  */
   HOST_WIDE_INT size;
@@ -267,7 +262,6 @@ 
   v = &stack_vars[stack_vars_num];
 
   v->decl = decl;
-  v->offset = 0;
   v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1);
   /* Ensure that all variables have size, so that &a != &b for any two
      variables that are simultaneously live.  */
@@ -403,9 +397,9 @@ 
     return (int)largeb - (int)largea;
 
   /* Secondary compare on size, decreasing  */
-  if (sizea < sizeb)
-    return -1;
   if (sizea > sizeb)
+    return -1;
+  if (sizea < sizeb)
     return 1;
 
   /* Tertiary compare on true alignment, decreasing.  */
@@ -564,28 +558,19 @@ 
 
 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
    partitioning algorithm.  Partitions A and B are known to be non-conflicting.
-   Merge them into a single partition A.
+   Merge them into a single partition A.  */
 
-   At the same time, add OFFSET to all variables in partition B.  At the end
-   of the partitioning process we've have a nice block easy to lay out within
-   the stack frame.  */
-
 static void
-union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset)
+union_stack_vars (size_t a, size_t b)
 {
-  size_t i, last;
   struct stack_var *vb = &stack_vars[b];
   bitmap_iterator bi;
   unsigned u;
 
-  /* Update each element of partition B with the given offset,
-     and merge them into partition A.  */
-  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next)
-    {
-      stack_vars[i].offset += offset;
-      stack_vars[i].representative = a;
-    }
-  stack_vars[last].next = stack_vars[a].next;
+  gcc_assert (stack_vars[b].next == EOC);
+   /* Add B to A's partition.  */
+  stack_vars[b].next = stack_vars[a].next;
+  stack_vars[b].representative = a;
   stack_vars[a].next = b;
 
   /* Update the required alignment of partition A to account for B.  */
@@ -596,7 +581,8 @@ 
   if (vb->conflicts)
     {
       EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi)
-	add_stack_var_conflict (a, stack_vars[u].representative);
+        if (stack_vars[u].next == EOC && stack_vars[u].representative == u)
+          add_stack_var_conflict (a, u);
       BITMAP_FREE (vb->conflicts);
     }
 }
@@ -605,16 +591,13 @@ 
    partitions constrained by the interference graph.  The overall
    algorithm used is as follows:
 
-	Sort the objects by size.
+	Sort the objects by size in descending order.
 	For each object A {
 	  S = size(A)
 	  O = 0
 	  loop {
 	    Look for the largest non-conflicting object B with size <= S.
 	    UNION (A, B)
-	    offset(B) = O
-	    O += size(B)
-	    S -= size(B)
 	  }
 	}
 */
@@ -636,24 +619,23 @@ 
   for (si = 0; si < n; ++si)
     {
       size_t i = stack_vars_sorted[si];
-      HOST_WIDE_INT isize = stack_vars[i].size;
       unsigned int ialign = stack_vars[i].alignb;
-      HOST_WIDE_INT offset = 0;
 
-      for (sj = si; sj-- > 0; )
+      /* Ignore objects that aren't partition representatives. If we
+         see a var that is not a partition representative, it must
+         have been merged earlier.  */
+      if (stack_vars[i].representative != i)
+        continue;
+
+      for (sj = si + 1; sj < n; ++sj)
 	{
 	  size_t j = stack_vars_sorted[sj];
-	  HOST_WIDE_INT jsize = stack_vars[j].size;
 	  unsigned int jalign = stack_vars[j].alignb;
 
 	  /* Ignore objects that aren't partition representatives.  */
 	  if (stack_vars[j].representative != j)
 	    continue;
 
-	  /* Ignore objects too large for the remaining space.  */
-	  if (isize < jsize)
-	    continue;
-
 	  /* Ignore conflicting objects.  */
 	  if (stack_var_conflict_p (i, j))
 	    continue;
@@ -664,25 +646,8 @@ 
 	      != (jalign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT))
 	    continue;
 
-	  /* Refine the remaining space check to include alignment.  */
-	  if (offset & (jalign - 1))
-	    {
-	      HOST_WIDE_INT toff = offset;
-	      toff += jalign - 1;
-	      toff &= -(HOST_WIDE_INT)jalign;
-	      if (isize - (toff - offset) < jsize)
-		continue;
-
-	      isize -= toff - offset;
-	      offset = toff;
-	    }
-
 	  /* UNION the objects, placing J at OFFSET.  */
-	  union_stack_vars (i, j, offset);
-
-	  isize -= jsize;
-	  if (isize == 0)
-	    break;
+	  union_stack_vars (i, j);
 	}
     }
 
@@ -712,9 +677,8 @@ 
 	{
 	  fputc ('\t', dump_file);
 	  print_generic_expr (dump_file, stack_vars[j].decl, dump_flags);
-	  fprintf (dump_file, ", offset " HOST_WIDE_INT_PRINT_DEC "\n",
-		   stack_vars[j].offset);
 	}
+      fputc ('\n', dump_file);
     }
 }
 
@@ -863,10 +827,9 @@ 
 	 partition.  */
       for (j = i; j != EOC; j = stack_vars[j].next)
 	{
-	  gcc_assert (stack_vars[j].offset <= stack_vars[i].size);
 	  expand_one_stack_var_at (stack_vars[j].decl,
 				   base, base_align,
-				   stack_vars[j].offset + offset);
+				   offset);
 	}
     }