From patchwork Tue Jun 22 18:30:06 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 56561 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 20E58B6F16 for ; Wed, 23 Jun 2010 04:30:22 +1000 (EST) Received: (qmail 6457 invoked by alias); 22 Jun 2010 18:30:21 -0000 Received: (qmail 6381 invoked by uid 22791); 22 Jun 2010 18:30:17 -0000 X-SWARE-Spam-Status: No, hits=-0.8 required=5.0 tests=AWL, BAYES_40, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from nikam-dmz.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 Jun 2010 18:30:09 +0000 Received: from localhost (kamzik.ms.mff.cuni.cz [195.113.18.74]) by nikam.ms.mff.cuni.cz (Postfix) with ESMTP id D141D9AC858; Tue, 22 Jun 2010 20:30:06 +0200 (CEST) Received: by localhost (Postfix, from userid 16202) id B66CD37F8F; Tue, 22 Jun 2010 20:30:06 +0200 (CEST) Date: Tue, 22 Jun 2010 20:30:06 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org, steven@gcc.gnu.org, bonzini@gnu.org Subject: More df_worklist_dataflow improvements Message-ID: <20100622183006.GB23996@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, this patch aims to reduce worklist bitmap traffic in df-core. It reduces builds times from 9m32s to 9m15s on my setup (average from 3 runs, each run has less than 10s noise). The patch adds IN_WORKLIST flag for BB that says that BB is in worklist (either workslit or pending). To make lookups fast, I use 1 bit of aux field used currently for age structure. The internal loop of solver walks edges and compute confluences and after updating the transfer function it walks edges again and inserts everything to worklist. Those loops all test considered bitmap, that is rather sily. The first loop consider only BBs with AGE younger than last visit. The patch aranges the last visits to start at 1, so AGEs of BBs not considered are all 0 and confluence is never computed on them. The second loop inserts only things not in worklist yet, so the patch arranges the IN_WORKLIST flag to be true for all basic blocks initially that makes not considered blocks to never enter the queues. The profile now looks as follows: : EXECUTE_IF_SET_IN_BITMAP (worklist, 0, index, bi) : { : unsigned bb_index; 117 0.0101 : dcount++; : 470 0.0404 : bb_index = blocks_in_postorder[index]; 1320 0.1136 : bb = BASIC_BLOCK (bb_index); 135 0.0116 : prev_age = VEC_index (int, last_visit_age, index) + 1; : CLEAR_BB_IN_WORKLIST (bb); 2 1.7e-04 : if (dir == DF_FORWARD) 61 0.0052 : changed = df_worklist_propagate_forward (dataflow, bb_index, : bbindex_to_postorder, : pending, : prev_age); : else 21 0.0018 : changed = df_worklist_propagate_backward (dataflow, bb_index, : bbindex_to_postorder, : pending, : prev_age); 269 0.0231 : VEC_replace (int, last_visit_age, index, age++); 159 0.0137 : if (changed) 192 0.0165 : SET_BB_LAST_CHANGE_AGE (bb, age); : } 11 9.5e-04 : bitmap_clear (worklist); : } for df_worklist_dataflow_doublequeue and 28 0.0024 : bool changed = age == 1; : : /* Calculate of incoming edges. */ 339 0.0292 : if (EDGE_COUNT (bb->preds) > 0) 19 0.0016 : FOR_EACH_EDGE (e, ei, bb->preds) : { 78 0.0067 : if (age <= BB_LAST_CHANGE_AGE (e->src)) 206 0.0177 : changed |= dataflow->problem->con_fun_n (e); : } 11 9.5e-04 : else if (dataflow->problem->con_fun_0) : dataflow->problem->con_fun_0 (bb); : 62 0.0053 : if (changed 108 0.0093 : && dataflow->problem->trans_fun (bb_index)) : { : /* The out set of this block has changed. : Propagate to the outgoing blocks. */ 87 0.0075 : FOR_EACH_EDGE (e, ei, bb->succs) 393 0.0338 : if (!BB_IN_WORKLIST (e->dest)) : { 6 5.2e-04 : unsigned ob_index = e->dest->index; : 97 0.0083 : SET_BB_IN_WORKLIST (e->dest); 9 7.7e-04 : bitmap_set_bit (pending, bbindex_to_postorder[ob_index]); for propagation function. I will experiment with changing the worklist implementation to arrays qsorted after each iteration, bitmap is not really fitting this purpose well, that should improve situation just at EXECUTE_IF_SET_IN_BITMAP (that itself is pretty slow but not visible in oprofile since samples goes to bitmap.h iterator implementation). It seems it would make more sense to have the blocks_in_postorder containing BB pointers rather than indexes too. I've also reordered the loops for better cache locality. The profile is as follows: CPU: AMD64 family10, speed 800 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % symbol name 47356 4.0743 htab_find_slot_with_hash 16628 1.4306 df_note_compute.42013.3719 14744 1.2685 fast_dce.487977.constprop.555 14451 1.2433 bitmap_set_bit_1 13575 1.1679 bitmap_ior_into 12050 1.0367 df_worklist_dataflow 11446 0.9848 ggc_internal_alloc_stat 11250 0.9679 htab_find_with_hash 10492 0.9027 record_reg_classes.105277.constprop.851.3375 There are reductions in df_worklist_dataflow, bitmap_set_bit1 and bitmap_clear-bit, all about 0.2% (of overall sample count) Bootstrapped/regtested x86_64-linux, OK? Honza * df-core.c (SET_BB_LAST_CHANGE_AGE, BB_IN_WORKLIST, SET_BB_IN_WORKLIST, CLEAR_BB_IN_WORKLIST): New macros. (BB_LAST_CHANGE_AGE): Do not use last bit of aux. (df_worklist_propagate_forward, df_worklist_propagate_backward): Minimal age is 1; do not try to add BB already in worklist; do not use considered bitmap. (df_worklist_dataflow_doublequeue): Minimal age is 1; do not clear aux fields; do not take conisdered bitmaps, managed BB_IN_WORKLIST flag. (df_worklist_dataflow): Do not initialize considered bitmap; invalidate unused bbindex_to_postorder only when checking; initialize LAST_CHANGE_AGE to 1 on BBs we care about; update call of df_worklist_dataflow_doublequeue; clear aux fileds when done. Index: df-core.c =================================================================== --- df-core.c (revision 161199) +++ df-core.c (working copy) @@ -851,8 +851,26 @@ struct rtl_opt_pass pass_df_finish = The general data flow analysis engine. ----------------------------------------------------------------------------*/ +/* We use bb->aux to store information about age and worklist. We always inspect BB structure + when accessing those, so this leads to good memory locality. + + First bit of bb->aux is used to indicate whther BB is in worklist. BBs are all having + aux 0 and they are initially in worklist, so value of 0 indicate that BB is in worklist + and value of 1 indicate that BB is not. This is because we want to pretend BBs not + considered to be in worklist (to avoid inserting them) and this way we save need + to initialize their AUX fields. + + Other bits of AUX are used to represent age. */ + /* Return time BB when it was visited for last time. */ -#define BB_LAST_CHANGE_AGE(bb) ((ptrdiff_t)(bb)->aux) +#define BB_LAST_CHANGE_AGE(bb) (((ptrdiff_t)(bb)->aux) >> 1) +/* Set BB's visit time. */ +#define SET_BB_LAST_CHANGE_AGE(bb, age) (bb)->aux = (void *)(ptrdiff_t)((age) << 1 | ((ptrdiff_t)bb->aux & 1)) +/* Return true when BB scheduled for processing, either in worklist or pending. */ +#define BB_IN_WORKLIST(bb) (!((ptrdiff_t)(bb)->aux & 1)) +/* Set and clear BB in worklist flag. */ +#define SET_BB_IN_WORKLIST(bb) ((bb)->aux = (void *)((ptrdiff_t)(bb)->aux & ~1)) +#define CLEAR_BB_IN_WORKLIST(bb) ((bb)->aux = (void *)((ptrdiff_t)(bb)->aux | 1)) /* Helper function for df_worklist_dataflow. Propagate the dataflow forward. @@ -876,21 +894,19 @@ df_worklist_propagate_forward (struct da unsigned bb_index, unsigned *bbindex_to_postorder, bitmap pending, - sbitmap considered, ptrdiff_t age) { edge e; edge_iterator ei; basic_block bb = BASIC_BLOCK (bb_index); - bool changed = !age; + bool changed = age == 1; /* Calculate of incoming edges. */ if (EDGE_COUNT (bb->preds) > 0) FOR_EACH_EDGE (e, ei, bb->preds) { - if (age <= BB_LAST_CHANGE_AGE (e->src) - && TEST_BIT (considered, e->src->index)) - changed |= dataflow->problem->con_fun_n (e); + if (age <= BB_LAST_CHANGE_AGE (e->src)) + changed |= dataflow->problem->con_fun_n (e); } else if (dataflow->problem->con_fun_0) dataflow->problem->con_fun_0 (bb); @@ -901,12 +917,13 @@ df_worklist_propagate_forward (struct da /* The out set of this block has changed. Propagate to the outgoing blocks. */ FOR_EACH_EDGE (e, ei, bb->succs) - { - unsigned ob_index = e->dest->index; - - if (TEST_BIT (considered, ob_index)) - bitmap_set_bit (pending, bbindex_to_postorder[ob_index]); - } + if (!BB_IN_WORKLIST (e->dest)) + { + unsigned ob_index = e->dest->index; + + SET_BB_IN_WORKLIST (e->dest); + bitmap_set_bit (pending, bbindex_to_postorder[ob_index]); + } return true; } return false; @@ -921,20 +938,18 @@ df_worklist_propagate_backward (struct d unsigned bb_index, unsigned *bbindex_to_postorder, bitmap pending, - sbitmap considered, ptrdiff_t age) { edge e; edge_iterator ei; basic_block bb = BASIC_BLOCK (bb_index); - bool changed = !age; + bool changed = age == 1; /* Calculate of incoming edges. */ if (EDGE_COUNT (bb->succs) > 0) FOR_EACH_EDGE (e, ei, bb->succs) { - if (age <= BB_LAST_CHANGE_AGE (e->dest) - && TEST_BIT (considered, e->dest->index)) + if (age <= BB_LAST_CHANGE_AGE (e->dest)) changed |= dataflow->problem->con_fun_n (e); } else if (dataflow->problem->con_fun_0) @@ -946,12 +961,13 @@ df_worklist_propagate_backward (struct d /* The out set of this block has changed. Propagate to the outgoing blocks. */ FOR_EACH_EDGE (e, ei, bb->preds) - { - unsigned ob_index = e->src->index; - - if (TEST_BIT (considered, ob_index)) - bitmap_set_bit (pending, bbindex_to_postorder[ob_index]); - } + if (!BB_IN_WORKLIST (e->src)) + { + unsigned ob_index = e->src->index; + + SET_BB_IN_WORKLIST (e->src); + bitmap_set_bit (pending, bbindex_to_postorder[ob_index]); + } return true; } return false; @@ -979,7 +995,6 @@ df_worklist_propagate_backward (struct d static void df_worklist_dataflow_doublequeue (struct dataflow *dataflow, bitmap pending, - sbitmap considered, int *blocks_in_postorder, unsigned *bbindex_to_postorder, int n_blocks) @@ -987,12 +1002,11 @@ df_worklist_dataflow_doublequeue (struct enum df_flow_dir dir = dataflow->problem->dir; int dcount = 0; bitmap worklist = BITMAP_ALLOC (&df_bitmap_obstack); - int age = 0; + int age = 1; bool changed; VEC(int, heap) *last_visit_age = NULL; int prev_age; basic_block bb; - int i; VEC_safe_grow_cleared (int, heap, last_visit_age, n_blocks); @@ -1013,28 +1027,26 @@ df_worklist_dataflow_doublequeue (struct unsigned bb_index; dcount++; - bitmap_clear_bit (pending, index); bb_index = blocks_in_postorder[index]; bb = BASIC_BLOCK (bb_index); - prev_age = VEC_index (int, last_visit_age, index); + prev_age = VEC_index (int, last_visit_age, index) + 1; + CLEAR_BB_IN_WORKLIST (bb); if (dir == DF_FORWARD) changed = df_worklist_propagate_forward (dataflow, bb_index, bbindex_to_postorder, - pending, considered, + pending, prev_age); else changed = df_worklist_propagate_backward (dataflow, bb_index, bbindex_to_postorder, - pending, considered, + pending, prev_age); - VEC_replace (int, last_visit_age, index, ++age); + VEC_replace (int, last_visit_age, index, age++); if (changed) - bb->aux = (void *)(ptrdiff_t)age; + SET_BB_LAST_CHANGE_AGE (bb, age); } bitmap_clear (worklist); } - for (i = 0; i < n_blocks; i++) - BASIC_BLOCK (blocks_in_postorder[i])->aux = NULL; BITMAP_FREE (worklist); BITMAP_FREE (pending); @@ -1063,11 +1075,8 @@ df_worklist_dataflow (struct dataflow *d int n_blocks) { bitmap pending = BITMAP_ALLOC (&df_bitmap_obstack); - sbitmap considered = sbitmap_alloc (last_basic_block); - bitmap_iterator bi; unsigned int *bbindex_to_postorder; int i; - unsigned int index; enum df_flow_dir dir = dataflow->problem->dir; gcc_assert (dir != DF_NONE); @@ -1076,21 +1085,20 @@ df_worklist_dataflow (struct dataflow *d bbindex_to_postorder = (unsigned int *)xmalloc (last_basic_block * sizeof (unsigned int)); +#ifdef ENABLE_CHECKING /* Initialize the array to an out-of-bound value. */ for (i = 0; i < last_basic_block; i++) bbindex_to_postorder[i] = last_basic_block; - - /* Initialize the considered map. */ - sbitmap_zero (considered); - EXECUTE_IF_SET_IN_BITMAP (blocks_to_consider, 0, index, bi) - { - SET_BIT (considered, index); - } +#endif /* Initialize the mapping of block index to postorder. */ for (i = 0; i < n_blocks; i++) { bbindex_to_postorder[blocks_in_postorder[i]] = i; + /* Start with Age of 1. Blocks not considered for dataflow + will all have age of 0 that will prevent us from computing + confluence on them. */ + SET_BB_LAST_CHANGE_AGE (BASIC_BLOCK (blocks_in_postorder[i]), 1); /* Add all blocks to the worklist. */ bitmap_set_bit (pending, i); } @@ -1100,11 +1108,14 @@ df_worklist_dataflow (struct dataflow *d dataflow->problem->init_fun (blocks_to_consider); /* Solve it. */ - df_worklist_dataflow_doublequeue (dataflow, pending, considered, + df_worklist_dataflow_doublequeue (dataflow, pending, blocks_in_postorder, bbindex_to_postorder, n_blocks); - sbitmap_free (considered); + + for (i = 0; i < n_blocks; i++) + BASIC_BLOCK (blocks_in_postorder[i])->aux = NULL; + free (bbindex_to_postorder); }