diff mbox

0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch

Message ID 4C18F491.2000406@codesourcery.com
State New
Headers show

Commit Message

Maxim Kuvyrkov June 16, 2010, 3:58 p.m. UTC
Currently, code hoisting only checks immediately-dominated blocks for 
expressions to hoist.  I wonder if limiting the search for expressions 
is intentional.

This patch makes code hoisting search through all dominated blocks for 
expressions to hoist.

On one hand, hoisting expressions from all dominated blocks, not just 
the immediate dominees, provides significantly greater unification 
opportunities.  On the other hand, it can also substantially extend live 
ranges of pseudos, and increase register pressure.

Even considering the negatives, hoisting expressions from non-immediate 
dominees seems like the right choice to me.  Most expressions that can 
be moved several basic blocks up are constants, and IRA/reload should be 
able to rematerialize those under high register pressure.  On the 
flip-side, if an expression is complex, than it would be less costly to 
spill/restore it instead of calculating it in dominated blocks.

A compromise may be to limit the depth of search with a parameter.

OK to apply?

Thank you,

Comments

Steven Bosscher June 16, 2010, 4:27 p.m. UTC | #1
On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> Currently, code hoisting only checks immediately-dominated blocks for
> expressions to hoist.  I wonder if limiting the search for expressions is
> intentional.
>
> This patch makes code hoisting search through all dominated blocks for
> expressions to hoist.

And makes the algorithm quadratic in the size of the CFG. You should
limit the depth not only to avoid excessive live range lengths but
also for corner cases of strangely-formed CFGs.

Ciao!
Steven
Jeff Law June 21, 2010, 6:46 p.m. UTC | #2
On 06/16/10 10:27, Steven Bosscher wrote:
> On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov<maxim@codesourcery.com>  wrote:
>    
>> Currently, code hoisting only checks immediately-dominated blocks for
>> expressions to hoist.  I wonder if limiting the search for expressions is
>> intentional.
>>
>> This patch makes code hoisting search through all dominated blocks for
>> expressions to hoist.
>>      
> And makes the algorithm quadratic in the size of the CFG. You should
> limit the depth not only to avoid excessive live range lengths but
> also for corner cases of strangely-formed CFGs.
>    
Technically true, but we only care about the dominance tree here, not 
the entire CFG.  The change which made code hoisting only look at the 
immediate dominators was a mistake.  It's unfortunate that 
get_dominated_by only returns immediate dominators -- based on the name, 
one could reasonably expect to get the full set of dominators which I 
suspect happened back in 2002 when that change was made.

Maxim -- can you test f90-intrinsic-bit.f  with and without this change 
and report back on how the compilation time changes?

Thanks,
Jeff
Steven Bosscher June 21, 2010, 7:28 p.m. UTC | #3
On Mon, Jun 21, 2010 at 8:46 PM, Jeff Law <law@redhat.com> wrote:
> On 06/16/10 10:27, Steven Bosscher wrote:
>>
>> On Wed, Jun 16, 2010 at 5:58 PM, Maxim Kuvyrkov<maxim@codesourcery.com>
>>  wrote:
>>
>>>
>>> Currently, code hoisting only checks immediately-dominated blocks for
>>> expressions to hoist.  I wonder if limiting the search for expressions is
>>> intentional.
>>>
>>> This patch makes code hoisting search through all dominated blocks for
>>> expressions to hoist.
>>>
>>
>> And makes the algorithm quadratic in the size of the CFG. You should
>> limit the depth not only to avoid excessive live range lengths but
>> also for corner cases of strangely-formed CFGs.
>>
>
> Technically true, but we only care about the dominance tree here, not the
> entire CFG.  The change which made code hoisting only look at the immediate
> dominators was a mistake.  It's unfortunate that get_dominated_by only
> returns immediate dominators -- based on the name, one could reasonably
> expect to get the full set of dominators which I suspect happened back in
> 2002 when that change was made.

I experimented with a patch similar to Maxim's already 2.5 years ago
(and offered to work on it for CS, but there was no interest in this
work at the time :-/)  See these three Bugzilla comments:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13

Note especially the pessimization in comment #13 of PR33828.
Therefore I maintain my objection to this patch.

I hope Maxim will address this and the the other issues of PR33828.

But of course there is much more benefit to be obtained from finishing
the GIMPLE hoisting pass (which I also sent to CS, FWIW). It showed
~1% code size improvement (!) on CSiBE arm-elf when I last toyed with
it, even though it is not quite perfect and hoists things it
shouldn't...

Ciao!
Steven
Jeff Law June 21, 2010, 8:18 p.m. UTC | #4
On 06/21/10 13:28, Steven Bosscher wrote:
>
> I experimented with a patch similar to Maxim's already 2.5 years ago
> (and offered to work on it for CS, but there was no interest in this
> work at the time :-/)  See these three Bugzilla comments:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
>    
Right.  This is precisely the problem with using immediate dominators.  
This doesn't argue that Maxim's approach is wrong or bad for compile 
time performance or anything like that.  It merely raises the same issue.

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c8
>    
Clearly something should be done about this.  If you have a testcase for 
Maxim that would be a help.  One could argue that adding the REG_EQUAL 
note created by hoisting to the hash table is a waste of time, fixing 
that would eliminate this problem.

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>
> Note especially the pessimization in comment #13 of PR33828.
> Therefore I maintain my objection to this patch.
>    
Clearly you don't want to hoist any higher than the lowest common 
dominator.  Otherwise you unreasonably lengthen lifetimes.  Maxim will 
need to address this problem as well.

Jeff
Steven Bosscher June 21, 2010, 8:38 p.m. UTC | #5
On Mon, Jun 21, 2010 at 10:18 PM, Jeff Law <law@redhat.com> wrote:
> Clearly you don't want to hoist any higher than the lowest common dominator.
>  Otherwise you unreasonably lengthen lifetimes.  Maxim will need to address
> this problem as well.

Yes, and I need to fix this for my GIMPLE hoisting pass also. Do you
know of an algorithm for this?

Ciao!
Steven
Jeff Law June 21, 2010, 9:35 p.m. UTC | #6
On 06/21/10 14:38, Steven Bosscher wrote:
> On Mon, Jun 21, 2010 at 10:18 PM, Jeff Law<law@redhat.com>  wrote:
>    
>> Clearly you don't want to hoist any higher than the lowest common dominator.
>>   Otherwise you unreasonably lengthen lifetimes.  Maxim will need to address
>> this problem as well.
>>      
> Yes, and I need to fix this for my GIMPLE hoisting pass also. Do you
> know of an algorithm for this?
>    
I recall seeing a lowest common ancestor algorithm in literature 
somewhere, so you could run that on the dominator tree I think.

Jeff
Steven Bosscher June 21, 2010, 9:50 p.m. UTC | #7
On Mon, Jun 21, 2010 at 11:35 PM, Jeff Law <law@redhat.com> wrote:
> I recall seeing a lowest common ancestor algorithm in literature somewhere,
> so you could run that on the dominator tree I think.

Yay. Even wikipedia knows about lowest common ancestor algorithms.

So there you go: even if one knows a language fairly well, you can
still look for something seemingly trivial for two years and not
succeed, simply because you just don't know the right terminology.

I'm going to play with some of those algorithms right away, so that
hopefully GIMPLE hoisting will be ready before the end of stage 1
after all (this was the major blocker for me to continue working on
it...).

Thanks!

Ciao!
Steven
Maxim Kuvyrkov June 22, 2010, 12:24 p.m. UTC | #8
On 6/22/10 12:18 AM, Jeff Law wrote:
> On 06/21/10 13:28, Steven Bosscher wrote:
>>
>> I experimented with a patch similar to Maxim's already 2.5 years ago
>> (and offered to work on it for CS, but there was no interest in this
>> work at the time :-/) See these three Bugzilla comments:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c2
> Right. This is precisely the problem with using immediate dominators.
> This doesn't argue that Maxim's approach is wrong or bad for compile
> time performance or anything like that. It merely raises the same issue.

I agree with Steven that the search is better be constrained, possibly, 
with a large enough constant.  I've added a new parameter and a 
dominance.c function to return dominated blocks up to depth N in the 
dominator tree (with N==1 being immediate dominators and N==0 being all 
dominators).

Does this sound OK?

...
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33828#c13
>>
>> Note especially the pessimization in comment #13 of PR33828.
>> Therefore I maintain my objection to this patch.
> Clearly you don't want to hoist any higher than the lowest common
> dominator. Otherwise you unreasonably lengthen lifetimes. Maxim will
> need to address this problem as well.

This can be addressed with a walk over the dominator tree after we 
compute VBEout.  Start with the root and descend in the tree keeping a 
bitset of expressions that should be alive up the tree.  If current node

1. has a single successor,
2. has i'th expression set in VBEout,
3. the successor has i'th expression set in VBEout,
4. current node doesn't generate i'th expression,
5. i'th expression is not marked in the bitset as required up the tree,

than we can hoist i'th expression in the successor with the same result 
as in the current node and not unnecessarily extend live ranges.  There 
maybe a couple more details to the above, but the problem should be 
easily fixable.

I will post second version of the patch in a couple of days.

Thank you,
Paolo Bonzini June 22, 2010, 2:07 p.m. UTC | #9
On 06/21/2010 11:50 PM, Steven Bosscher wrote:
> On Mon, Jun 21, 2010 at 11:35 PM, Jeff Law<law@redhat.com>  wrote:
>> I recall seeing a lowest common ancestor algorithm in literature somewhere,
>> so you could run that on the dominator tree I think.
>
> Yay. Even wikipedia knows about lowest common ancestor algorithms.

Even dominance.c knows, nearest_common_dominator.

:)

Paolo
Maxim Kuvyrkov June 29, 2010, 6:45 p.m. UTC | #10
On 6/21/10 10:46 PM, Jeff Law wrote:
...
> Technically true, but we only care about the dominance tree here, not
> the entire CFG. The change which made code hoisting only look at the
> immediate dominators was a mistake. It's unfortunate that
> get_dominated_by only returns immediate dominators -- based on the name,
> one could reasonably expect to get the full set of dominators which I
> suspect happened back in 2002 when that change was made.
>
> Maxim -- can you test f90-intrinsic-bit.f with and without this change
> and report back on how the compilation time changes?

Compile time of f90-intrinsic-bit.f doesn't change in any meaningful way.

Regards,
diff mbox

Patch

diff --git a/gcc/gcse.c b/gcc/gcse.c
index 45cab70..a7c7237 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -4327,7 +4327,8 @@  hoist_code (void)
       int found = 0;
       int insn_inserted_p;
 
-      domby = get_dominated_by (CDI_DOMINATORS, bb);
+      domby = get_all_dominated_blocks (CDI_DOMINATORS, bb);
+
       /* Examine each expression that is very busy at the exit of this
 	 block.  These are the potentially hoistable expressions.  */
       for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++)
@@ -4418,7 +4419,11 @@  hoist_code (void)
 		     it would be safe to compute it at the start of the
 		     dominated block.  Now we have to determine if the
 		     expression would reach the dominated block if it was
-		     placed at the end of BB.  */
+		     placed at the end of BB.
+		     Note: the fact that hoist_exprs has i-th bit set means
+		     that /some/, not necesserilly all, occurences from
+		     the dominated blocks can be hoisted to BB.  Here we check
+		     if a specific occurence can be hoisted to BB.  */
 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL))
 		    {
 		      struct expr *expr = index_map[i];
@@ -4431,6 +4436,12 @@  hoist_code (void)
 			occr = occr->next;
 
 		      gcc_assert (occr);
+
+		      /* An occurence might've been already deleted
+			 while processing a dominator of BB.  */
+		      if (occr->deleted_p)
+			continue;
+
 		      insn = occr->insn;
 		      set = single_set (insn);
 		      gcc_assert (set);