From patchwork Thu Jun 24 15:53:37 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Kuvyrkov X-Patchwork-Id: 56806 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 26130B6F44 for ; Fri, 25 Jun 2010 01:53:58 +1000 (EST) Received: (qmail 17919 invoked by alias); 24 Jun 2010 15:53:55 -0000 Received: (qmail 17877 invoked by uid 22791); 24 Jun 2010 15:53:52 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL, BAYES_00, TW_EP, TW_PX, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 24 Jun 2010 15:53:42 +0000 Received: (qmail 1026 invoked from network); 24 Jun 2010 15:53:40 -0000 Received: from unknown (HELO ?172.16.1.24?) (maxim@127.0.0.2) by mail.codesourcery.com with ESMTPA; 24 Jun 2010 15:53:40 -0000 Message-ID: <4C237F81.9090807@codesourcery.com> Date: Thu, 24 Jun 2010 19:53:37 +0400 From: Maxim Kuvyrkov User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 MIME-Version: 1.0 To: Jeff Law CC: Steven Bosscher , gcc-patches Subject: Re: 0005-Search-all-dominated-blocks-for-expressions-to-hoist.patch References: <4C18F225.2040509@codesourcery.com> <4C18F491.2000406@codesourcery.com> <4C1FB369.4080006@redhat.com> <4C1FC91D.401@redhat.com> <4C20AB99.6080807@codesourcery.com> <4C225BA4.9070603@codesourcery.com> In-Reply-To: <4C225BA4.9070603@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 6/23/10 11:08 PM, Maxim Kuvyrkov wrote: ... >> This can be addressed with a walk over the dominator tree after we >> compute VBEout. Start with the root and descend in the tree keeping a >> bitset of expressions that should be alive up the tree. If current node >> >> 1. has a single successor, >> 2. has i'th expression set in VBEout, >> 3. the successor has i'th expression set in VBEout, >> 4. current node doesn't generate i'th expression, >> 5. i'th expression is not marked in the bitset as required up the tree, >> >> than we can hoist i'th expression in the successor with the same result >> as in the current node and not unnecessarily extend live ranges. There >> maybe a couple more details to the above, but the problem should be >> easily fixable. > > This is implemented as cleanup_code_hoist_vbeout() function. The > solution it produces is OK from correctness point of view (it removes > bits from VBEout), but, please, *check my reasoning* to make sure it > doesn't remove from VBEout expressions it shouldn't. There is a flaw in the implementation I posted yesterday. VBEout sets have to be cleaned up considering data both downward and upward the dominator tree; see new example and comments in compute_code_hoist_vbeinout. This updated patch corrects the cleaning routine and adds several comments to annotate its actions. Does this look OK? diff --git a/gcc/basic-block.h b/gcc/basic-block.h index 135c0c2..1bf192d 100644 --- a/gcc/basic-block.h +++ b/gcc/basic-block.h @@ -854,6 +854,8 @@ extern VEC (basic_block, heap) *get_dominated_by (enum cdi_direction, basic_bloc extern VEC (basic_block, heap) *get_dominated_by_region (enum cdi_direction, basic_block *, unsigned); +extern VEC (basic_block, heap) *get_dominated_to_depth (enum cdi_direction, + basic_block, int); extern VEC (basic_block, heap) *get_all_dominated_blocks (enum cdi_direction, basic_block); extern void add_to_dominance_info (enum cdi_direction, basic_block); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9e517e9..05ebcf0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8195,6 +8195,12 @@ when @option{-ftree-vectorize} is used. The number of iterations after vectorization needs to be greater than the value specified by this option to allow vectorization. The default value is 0. +@item max-hoist-depth +The depth of search in the dominator tree for expressions to hoist. +This is used to avoid quadratic behavior in hoisting algorithm. +The value of 0 will avoid limiting the search, but may slow down compilation +of huge functions. The default value is 30. + @item max-unrolled-insns The maximum number of instructions that a loop should have if that loop is unrolled, and if the loop is unrolled, it determines how many times diff --git a/gcc/dominance.c b/gcc/dominance.c index 9c2dcf0..7861439 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -783,16 +783,20 @@ get_dominated_by_region (enum cdi_direction dir, basic_block *region, } /* Returns the list of basic blocks including BB dominated by BB, in the - direction DIR. The vector will be sorted in preorder. */ + direction DIR up to DEPTH in the dominator tree. The DEPTH of zero will + produce a vector containing all dominated blocks. The vector will be sorted + in preorder. */ VEC (basic_block, heap) * -get_all_dominated_blocks (enum cdi_direction dir, basic_block bb) +get_dominated_to_depth (enum cdi_direction dir, basic_block bb, int depth) { VEC(basic_block, heap) *bbs = NULL; unsigned i; + unsigned next_level_start; i = 0; VEC_safe_push (basic_block, heap, bbs, bb); + next_level_start = 1; /* = VEC_length (basic_block, bbs); */ do { @@ -803,12 +807,24 @@ get_all_dominated_blocks (enum cdi_direction dir, basic_block bb) son; son = next_dom_son (dir, son)) VEC_safe_push (basic_block, heap, bbs, son); + + if (i == next_level_start && --depth) + next_level_start = VEC_length (basic_block, bbs); } - while (i < VEC_length (basic_block, bbs)); + while (i < next_level_start); return bbs; } +/* Returns the list of basic blocks including BB dominated by BB, in the + direction DIR. The vector will be sorted in preorder. */ + +VEC (basic_block, heap) * +get_all_dominated_blocks (enum cdi_direction dir, basic_block bb) +{ + return get_dominated_to_depth (dir, bb, 0); +} + /* Redirect all edges pointing to BB to TO. */ void redirect_immediate_dominators (enum cdi_direction dir, basic_block bb, diff --git a/gcc/gcse.c b/gcc/gcse.c index 39660d5..572cfdb 100644 --- a/gcc/gcse.c +++ b/gcc/gcse.c @@ -4155,6 +4155,7 @@ compute_code_hoist_vbeinout (void) { int changed, passes; basic_block bb; + sbitmap tmp1, tmp2; sbitmap_vector_zero (hoist_vbeout, last_basic_block); sbitmap_vector_zero (hoist_vbein, last_basic_block); @@ -4203,6 +4204,130 @@ compute_code_hoist_vbeinout (void) dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]); } } + + /* Now cleanup VBEout to avoid moving expressions too far up. + + We follow two rules to clean up VBEout[BB]: + + 1. If BB does not have any dominated blocks, nothing will ever be hoisted + to BB, so we can just wipe its VBEout clean. + + 2. If an expression can be hoisted both to BB and to a *single* successor + of BB in the dominator tree, then there is no point of hoisting + the expression to BB over BB's successor. Doing otherwise would + unnecessarily extend live ranges. */ + + /* Wipe VBEout of leaf blocks in the dominator tree. */ + FOR_EACH_BB (bb) + if (first_dom_son (CDI_DOMINATORS, bb) == NULL) + sbitmap_zero (hoist_vbeout[bb->index]); + + tmp1 = sbitmap_alloc (expr_hash_table.n_elems); + tmp2 = sbitmap_alloc (expr_hash_table.n_elems); + + /* We cannot cleanup VBEout in a single traversal. There has to be both + upward and downward links when computing VBEout of current block to + avoid removing bits that shouldn't be removed. E.g., consider + the following dominator tree; '*' marks blocks which compute same + expression, the expression can be freely moved; the expected result + is that we move computations of '*' from (3) and (6) to (2). + + 2 + / \ + 3* 4 + / \ + 5 6* + + Doing a depth-first search over this tree without and upward link + will remove the expression from VBEout[4] (there's no point of hoisting + the expression to (4) if it's not computed in both (5) and (6). + When cleaning up VBEout[2] we won't see the expression as needed in (4), + so we will remove it from VBEout[2] leaving it to (3) to calculate + it's own copy of '*'. + + Therefore, we use iterative algorithm to solve this problem with both + upward and downward links. The algorithm obviously converges as at + each iteration we make VBEout sets only smaller. */ + + passes = 0; + changed = 1; + + while (changed) + { + changed = 0; + + FOR_EACH_BB (bb) + { + basic_block son; + bool first_p = true; + + /* Walk through dominated blocks and calculate the set of expressions + that are needed in any one, and only one, of the blocks. + TMP1 is the basis of what we want to remove from VBEout[BB]. */ + for (son = first_dom_son (CDI_DOMINATORS, bb); + son != NULL; + son = next_dom_son (CDI_DOMINATORS, son)) + { + if (first_p) + { + sbitmap_copy (tmp1, hoist_vbeout[son->index]); + first_p = false; + } + else + sbitmap_difference (tmp1, tmp1, hoist_vbeout[son->index]); + } + + if (!first_p) + { + /* Now trim TMP1 to avoid removing too much. */ + + if (bb->prev_bb != ENTRY_BLOCK_PTR) + /* Remove epxressions from TMP1 that are needed upwards. + These are VBEout[parent] minus expressions that are + killed in BB (and, hence, don't get to VBEout[parent] from + BB). */ + { + basic_block parent; + + parent = get_immediate_dominator (CDI_DOMINATORS, bb); + + sbitmap_difference (tmp2, hoist_vbeout[parent->index], + transp[bb->index]); + + sbitmap_difference (tmp1, tmp1, tmp2); + } + + /* Never remove any of expressions computed in BB from + VBEout[BB]. */ + sbitmap_difference (tmp1, tmp1, comp[bb->index]); + + if (sbitmap_any_common_bits (hoist_vbeout[bb->index], tmp1)) + /* There is at least one bit that can be removed from + VBEout[BB]. */ + { + sbitmap_difference (hoist_vbeout[bb->index], + hoist_vbeout[bb->index], tmp1); + changed = 1; + } + } + } + + passes++; + } + + if (dump_file) + { + fprintf (dump_file, "hoisting vbeout cleanup: %d passes\n", passes); + + FOR_EACH_BB (bb) + { + fprintf (dump_file, "vbeout(%d): ", bb->index); + dump_sbitmap_file (dump_file, hoist_vbeout[bb->index]); + } + } + + sbitmap_free (tmp1); + sbitmap_free (tmp2); } /* Top level routine to do the dataflow analysis needed by code hoisting. */ @@ -4212,8 +4337,8 @@ compute_code_hoist_data (void) { compute_local_properties (transp, comp, antloc, &expr_hash_table); compute_transpout (); - compute_code_hoist_vbeinout (); calculate_dominance_info (CDI_DOMINATORS); + compute_code_hoist_vbeinout (); if (dump_file) fprintf (dump_file, "\n"); } @@ -4306,7 +4431,8 @@ hoist_code (void) int found = 0; int insn_inserted_p; - domby = get_dominated_by (CDI_DOMINATORS, bb); + domby = get_dominated_to_depth (CDI_DOMINATORS, bb, MAX_HOIST_DEPTH); + /* Examine each expression that is very busy at the exit of this block. These are the potentially hoistable expressions. */ for (i = 0; i < hoist_vbeout[bb->index]->n_bits; i++) @@ -4397,7 +4523,11 @@ hoist_code (void) it would be safe to compute it at the start of the dominated block. Now we have to determine if the expression would reach the dominated block if it was - placed at the end of BB. */ + placed at the end of BB. + Note: the fact that hoist_exprs has i-th bit set means + that /some/, not necesserilly all, occurences from + the dominated blocks can be hoisted to BB. Here we check + if a specific occurence can be hoisted to BB. */ if (hoist_expr_reaches_here_p (bb, i, dominated, NULL)) { struct expr *expr = index_map[i]; @@ -4410,6 +4540,12 @@ hoist_code (void) occr = occr->next; gcc_assert (occr); + + /* An occurence might've been already deleted + while processing a dominator of BB. */ + if (occr->deleted_p) + continue; + insn = occr->insn; set = single_set (insn); gcc_assert (set); diff --git a/gcc/params.def b/gcc/params.def index 35650ff..f08d482 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -219,6 +219,14 @@ DEFPARAM(PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION, "gcse-after-reload-critical-fraction", "The threshold ratio of critical edges execution count that permit performing redundancy elimination after reload", 10, 0, 0) +/* How deep from a given basic block the dominator tree should be searched + for expressions to hoist to the block. The value of 0 will avoid limiting + the search. */ +DEFPARAM(PARAM_MAX_HOIST_DEPTH, + "max-hoist-depth", + "Maximum depth of search in the dominator tree for expressions to hoist", + 30, 0, 0) + /* This parameter limits the number of insns in a loop that will be unrolled, and by how much the loop is unrolled. diff --git a/gcc/params.h b/gcc/params.h index 833fc3b..c0404ca 100644 --- a/gcc/params.h +++ b/gcc/params.h @@ -125,6 +125,8 @@ typedef enum compiler_param PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_PARTIAL_FRACTION) #define GCSE_AFTER_RELOAD_CRITICAL_FRACTION \ PARAM_VALUE (PARAM_GCSE_AFTER_RELOAD_CRITICAL_FRACTION) +#define MAX_HOIST_DEPTH \ + PARAM_VALUE (PARAM_MAX_HOIST_DEPTH) #define MAX_UNROLLED_INSNS \ PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) #define MAX_SMS_LOOP_NUMBER \