From patchwork Tue Apr 19 00:24:17 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Easwaran Raman X-Patchwork-Id: 91899 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id A02191007DD for ; Tue, 19 Apr 2011 10:24:41 +1000 (EST) Received: (qmail 23052 invoked by alias); 19 Apr 2011 00:24:39 -0000 Received: (qmail 23043 invoked by uid 22791); 19 Apr 2011 00:24:37 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_PASS, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 19 Apr 2011 00:24:20 +0000 Received: from wpaz29.hot.corp.google.com (wpaz29.hot.corp.google.com [172.24.198.93]) by smtp-out.google.com with ESMTP id p3J0OJxG017354 for ; Mon, 18 Apr 2011 17:24:19 -0700 Received: from pzk36 (pzk36.prod.google.com [10.243.19.164]) by wpaz29.hot.corp.google.com with ESMTP id p3J0NaQx016764 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 18 Apr 2011 17:24:18 -0700 Received: by pzk36 with SMTP id 36so2802631pzk.32 for ; Mon, 18 Apr 2011 17:24:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.125.1 with SMTP id x1mr2428948wfc.296.1303172657543; Mon, 18 Apr 2011 17:24:17 -0700 (PDT) Received: by 10.142.230.18 with HTTP; Mon, 18 Apr 2011 17:24:17 -0700 (PDT) In-Reply-To: References: Date: Mon, 18 Apr 2011 17:24:17 -0700 Message-ID: Subject: Re: Improve stack layout heuristic. From: Easwaran Raman To: Michael Matz Cc: gcc-patches@gcc.gnu.org X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Mon, Apr 18, 2011 at 5:58 AM, Michael Matz wrote: > > Hi, > > [FWIW I can't approve patches, but some feedback nevertheless] > > On Sun, 17 Apr 2011, Easwaran Raman wrote: > > >  This patch impoves the heuristic used in assigning stack location to > > stack variables.  Currently, if there are 3 variables A, B and C with > > their sizes in increasing order and A and C have a conflict, it will > > put A and B in a partition and C in a separate partition with a total > > frame size of sizeof(B) + sizeof(C).  This patch puts B and C in the > > same partition and A in a separate partition, with a total size of > > sizeof(A) + sizeof(C). > > That's the change in stack_var_cmp, to match the comment, right?  Makes > sense. > > > The other change in this patch removes the field offset in struct > > stack_var. Right now, the field is always 0 due to the way it is set in > > partition_stack_vars. > > Huh, indeed, starting with the initial version of that code already.  The > idea clearly was to pack multiple conflicting small objects into the space > of only one larger non-conflicting one by placing them at different > offsets, but that never seems to have been implemented and would require > different tracking of conflicts.  I think removing the offset tracking > right now is okay, can be reintroduced once somebody gets motivated > enough. Yes, the intent seems to be what you describe above. I haven't thought about it much, but I am not sure how much a non-backtracking version will buy over the current heuristic. > > > -     and merge them into partition A.  */ > > -  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next) > > -    { > > -      stack_vars[i].offset += offset; > > -      stack_vars[i].representative = a; > > -    } > > -  stack_vars[last].next = stack_vars[a].next; > > +   /* Add B to A's partition.  */ > > +  stack_vars[b].next = stack_vars[a].next; > > +  stack_vars[b].representative = a; > > Hmm.  This seems fishy.  You don't update the representatives of the > members of the partition that B is the leader of.  Additionally you break > the chain of B's members.  That is only a problem if B actually has more > than one member.  That might be always the case with your changed sorting > order, but it's an important invariant, so please assert this in this > routine. > B always has one member is an invariant in my scheme. The object corresponding to the outer loop index is always the representative. If B has to have more than one members, it must have been visited in the outer loop in some earlier iteration. But then, its index in the stack_vars_sorted must be greater than the current i. I have added an assertion than stack_vars[b].next == EOC in union_stack_vars which is true only when B has one member. > > > @@ -596,7 +581,7 @@ > >    if (vb->conflicts) > >      { > >        EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) > > -     add_stack_var_conflict (a, stack_vars[u].representative); > > +     add_stack_var_conflict (a, u); > > Please don't.  This uselessly bloats the conflict bitmaps. It is sufficient to add the conflicts of  a variable only when it is not merged into some group.  I am adding a check to that effect. I am attaching a new patch which excludes the strict aliasing check (which I will send separately) and the changes I have mentioned above. Bootstraps and no regressions in tests. Thanks, Easwaran > > Ciao, > Michael. Index: gcc/testsuite/gcc.dg/stack-layout-2.c =================================================================== --- gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-expand" } */ +void bar( char *); +int foo() +{ + int i=0; + { + char a[8000]; + bar(a); + i += a[0]; + } + { + char a[8192]; + char b[32]; + bar(a); + i += a[0]; + bar(b); + i += a[0]; + } + return i; +} +/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */ +/* { dg-final { scan-rtl-dump "size 32" "expand" } } */ Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c (revision 171954) +++ gcc/cfgexpand.c (working copy) @@ -158,11 +158,6 @@ /* The Variable. */ tree decl; - /* The offset of the variable. During partitioning, this is the - offset relative to the partition. After partitioning, this - is relative to the stack frame. */ - HOST_WIDE_INT offset; - /* Initially, the size of the variable. Later, the size of the partition, if this variable becomes it's partition's representative. */ HOST_WIDE_INT size; @@ -267,7 +262,6 @@ v = &stack_vars[stack_vars_num]; v->decl = decl; - v->offset = 0; v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1); /* Ensure that all variables have size, so that &a != &b for any two variables that are simultaneously live. */ @@ -403,9 +397,9 @@ return (int)largeb - (int)largea; /* Secondary compare on size, decreasing */ - if (sizea < sizeb) - return -1; if (sizea > sizeb) + return -1; + if (sizea < sizeb) return 1; /* Tertiary compare on true alignment, decreasing. */ @@ -564,28 +558,19 @@ /* A subroutine of partition_stack_vars. The UNION portion of a UNION/FIND partitioning algorithm. Partitions A and B are known to be non-conflicting. - Merge them into a single partition A. + Merge them into a single partition A. */ - At the same time, add OFFSET to all variables in partition B. At the end - of the partitioning process we've have a nice block easy to lay out within - the stack frame. */ - static void -union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset) +union_stack_vars (size_t a, size_t b) { - size_t i, last; struct stack_var *vb = &stack_vars[b]; bitmap_iterator bi; unsigned u; - /* Update each element of partition B with the given offset, - and merge them into partition A. */ - for (last = i = b; i != EOC; last = i, i = stack_vars[i].next) - { - stack_vars[i].offset += offset; - stack_vars[i].representative = a; - } - stack_vars[last].next = stack_vars[a].next; + gcc_assert (stack_vars[b].next == EOC); + /* Add B to A's partition. */ + stack_vars[b].next = stack_vars[a].next; + stack_vars[b].representative = a; stack_vars[a].next = b; /* Update the required alignment of partition A to account for B. */ @@ -596,7 +581,8 @@ if (vb->conflicts) { EXECUTE_IF_SET_IN_BITMAP (vb->conflicts, 0, u, bi) - add_stack_var_conflict (a, stack_vars[u].representative); + if (stack_vars[u].next == EOC && stack_vars[u].representative == u) + add_stack_var_conflict (a, u); BITMAP_FREE (vb->conflicts); } } @@ -605,16 +591,13 @@ partitions constrained by the interference graph. The overall algorithm used is as follows: - Sort the objects by size. + Sort the objects by size in descending order. For each object A { S = size(A) O = 0 loop { Look for the largest non-conflicting object B with size <= S. UNION (A, B) - offset(B) = O - O += size(B) - S -= size(B) } } */ @@ -636,24 +619,23 @@ for (si = 0; si < n; ++si) { size_t i = stack_vars_sorted[si]; - HOST_WIDE_INT isize = stack_vars[i].size; unsigned int ialign = stack_vars[i].alignb; - HOST_WIDE_INT offset = 0; - for (sj = si; sj-- > 0; ) + /* Ignore objects that aren't partition representatives. If we + see a var that is not a partition representative, it must + have been merged earlier. */ + if (stack_vars[i].representative != i) + continue; + + for (sj = si + 1; sj < n; ++sj) { size_t j = stack_vars_sorted[sj]; - HOST_WIDE_INT jsize = stack_vars[j].size; unsigned int jalign = stack_vars[j].alignb; /* Ignore objects that aren't partition representatives. */ if (stack_vars[j].representative != j) continue; - /* Ignore objects too large for the remaining space. */ - if (isize < jsize) - continue; - /* Ignore conflicting objects. */ if (stack_var_conflict_p (i, j)) continue; @@ -664,25 +646,8 @@ != (jalign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)) continue; - /* Refine the remaining space check to include alignment. */ - if (offset & (jalign - 1)) - { - HOST_WIDE_INT toff = offset; - toff += jalign - 1; - toff &= -(HOST_WIDE_INT)jalign; - if (isize - (toff - offset) < jsize) - continue; - - isize -= toff - offset; - offset = toff; - } - /* UNION the objects, placing J at OFFSET. */ - union_stack_vars (i, j, offset); - - isize -= jsize; - if (isize == 0) - break; + union_stack_vars (i, j); } } @@ -712,9 +677,8 @@ { fputc ('\t', dump_file); print_generic_expr (dump_file, stack_vars[j].decl, dump_flags); - fprintf (dump_file, ", offset " HOST_WIDE_INT_PRINT_DEC "\n", - stack_vars[j].offset); } + fputc ('\n', dump_file); } } @@ -863,10 +827,9 @@ partition. */ for (j = i; j != EOC; j = stack_vars[j].next) { - gcc_assert (stack_vars[j].offset <= stack_vars[i].size); expand_one_stack_var_at (stack_vars[j].decl, base, base_align, - stack_vars[j].offset + offset); + offset); } }