From patchwork Fri Jun 7 15:14:36 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 249739 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id C37F72C009E for ; Sat, 8 Jun 2013 01:15:06 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=H870KmqW5wko2Q63vLhplbPbqnHFR32iwWmxzvm6kC8+k7zcBh xOtIU5vhLVDAE6Lr3KoQ971acZ3sPyXMjYYvbtsAXLZpCtQ/LlNdVbAv404JiWtX qmbc5m3bc57NqgSWG/1BHkxVAvNDdMCo/8QV53wsT9yKFZrXnYirCT0sE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=e7cyzPBhge4QuCPov9H9AMLL/U0=; b=oWsrLUsf+FRy2Lnot4CH FrynhFZF/R3qjz+TXWHha82EFFdA5jiAQkgWSWhR39xoR97MDy61HKHDesfHsEVF hRoH6kJdOFcYFJZPh2ddTFUkylTbzqD9UaeRp+1xjq1TEwKtbquU7TcTJiYeMaef BuWxLwEVNtMn2UGykXIHwPU= Received: (qmail 31908 invoked by alias); 7 Jun 2013 15:15:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31897 invoked by uid 89); 7 Jun 2013 15:14:59 -0000 X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS, TW_CF, TW_TM autolearn=ham version=3.3.1 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 07 Jun 2013 15:14:55 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 07 Jun 2013 16:14:52 +0100 Received: from e106375-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 7 Jun 2013 16:14:50 +0100 From: James Greenhalgh To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, law@redhat.com, dnovillo@google.com, amacleod@redhat.com, ook@ucw.cz, sellcey@imgtec.com Subject: [Patch tree-ssa] RFC: Enable path threading for control variables (PR tree-optimization/54742). Date: Fri, 7 Jun 2013 16:14:36 +0100 Message-Id: <1370618076-5040-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-MC-Unique: 113060716145200701 X-Virus-Found: No Hello, After seeing Steve Ellcey's patch at: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00667.html I've been playing with pushing the same logic closer to the current jump-threading framework. I've split off Steve's patch into path search code (added to tree-ssa-threadedge.c) and path copy code (added to tree-ssa-threadupdate.c). This patch survives bootstrap and regression runs on arm and x86 with no problems. As it stands it is unconditionally added in the same places jump-threading would be, making it active at -O1 and above. With this, I saw no significant impact in compile times measured by --with-build-config=bootstrap-time. I'm not sure this is what Jeff Law intended when he asked that the code be moved to the existing jump-threading framework, I found gimple_duplicate_sese_region did a much better job of doing the right thing than I had expected, thus had no need to play with the block update code in tree-ssa-threadupdate.c. As for deciding when it is good to perform this optimisation, I use Steve's max_insn_count and max_path_count values, though clearly if this approach seems acceptable I can add command line flags as appropriate. In terms of touching generic code I had to make a change to the gimple_duplicate_sese_region code to patch up when I am copying across loop headers/latches which are not entry->dest/src. I use the assumption that, having copied a region, if block n was the loop header of the original region, block n will be the loop header of the new region and use this to patch up new loop information. My other generic change ensures that simplify_control_stmt_condition in tree-ssa-threadedge.c returns something for us to analyse even if it cannot simplify to a gimple min invariant. Beyond that, the path search code is modified from Steve's patch to only perform one depth first search, and the path copy code is modified to purge the region entry and exit edges for those which traditional jump-threading may consider. Opinions? Thanks, James Greenhalgh --- gcc/ 2013-06-07 James Greenhalgh PR tree-optimization/54742 * tree.cfg (gimple_duplicate_sese_region): Memoize loop latch and loop header blocks if copying across a latch/header. * tree-flow.h (thread_paths): New struct. (register_thread_paths): Declare. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Permit returning something not in gimple_min_invariant form. (max_insn_count): Define. (max_path_count): Likewise. (find_thread_path_1): New function. (find_thread_path): Likewise. (save_new_thread_path): Likewise. (find_control_statement_thread_paths): Likewise. (thread_across_edge): Handle non gimple_min_invariant cases. * tree-ssa-threadupdate.c (thread_paths_vec): New. (remove_edge_from_thread_candidates): New function. (duplicate_thread_path): Likewise. (copy_control_statement_thread_paths): Likewise. (thread_through_all_blocks): Handle thread_paths. (register_thread_paths): New function. diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 4b91a35..6dcd2e4 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -5717,10 +5717,12 @@ gimple_duplicate_sese_region (edge entry, edge exit, { unsigned i; bool free_region_copy = false, copying_header = false; + bool save_loop_details = false; struct loop *loop = entry->dest->loop_father; edge exit_copy; vec doms; edge redirected; + int memo_loop_header_no = 0, memo_loop_latch_no = 0; int total_freq = 0, entry_freq = 0; gcov_type total_count = 0, entry_count = 0; @@ -5738,9 +5740,15 @@ gimple_duplicate_sese_region (edge entry, edge exit, if (region[i]->loop_father != loop) return false; - if (region[i] != entry->dest - && region[i] == loop->header) - return false; + /* If we are copying an abnormally shaped region, keep track of + which block will become our loop header. */ + if ((region[i] != entry->dest && region[i] == loop->header) + ||(region[i] != entry->src && region[i] == loop->latch)) + { + save_loop_details = true; + memo_loop_latch_no = i; + memo_loop_header_no = i; + } } set_loop_copy (loop, loop); @@ -5821,6 +5829,13 @@ gimple_duplicate_sese_region (edge entry, edge exit, loop->latch = exit->src; } + /* Restore loop details if we were asked to save them. */ + if (save_loop_details) + { + loop->header = region_copy[memo_loop_header_no]; + loop->latch = region_copy[memo_loop_latch_no]; + } + /* Redirect the entry and add the phi node arguments. */ redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest)); gcc_assert (redirected != NULL); diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h index 24fcfbf..c16a1ba 100644 --- a/gcc/tree-flow.h +++ b/gcc/tree-flow.h @@ -604,6 +604,13 @@ struct tree_niter_desc enum tree_code cmp; }; +typedef struct thread_paths +{ + basic_block **paths; + int *path_sizes; + int path_count; +} thread_paths; + /* In tree-ssa-phiopt.c */ bool empty_block_p (basic_block); basic_block *blocks_in_phiopt_order (void); @@ -751,6 +758,7 @@ bool may_be_nonaddressable_p (tree expr); /* In tree-ssa-threadupdate.c. */ extern bool thread_through_all_blocks (bool); extern void register_jump_thread (edge, edge, edge); +extern void register_thread_paths (thread_paths *); /* In gimplify.c */ tree force_gimple_operand_1 (tree, gimple_seq *, gimple_predicate, tree); diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index b31e961..56aee59 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -542,6 +542,7 @@ simplify_control_stmt_condition (edge e, rather than use a relational operator. These are simpler to handle. */ if (TREE_CODE (cond) == SSA_NAME) { + tree cached_cached_lhs = NULL; cached_lhs = cond; /* Get the variable's current value from the equivalence chains. @@ -559,10 +560,17 @@ simplify_control_stmt_condition (edge e, if (handle_dominating_asserts && TREE_CODE (cached_lhs) == SSA_NAME) cached_lhs = lhs_of_dominating_assert (cached_lhs, e->src, stmt); + cached_cached_lhs = cached_lhs; /* If we haven't simplified to an invariant yet, then use the pass specific callback to try and simplify it further. */ if (cached_lhs && ! is_gimple_min_invariant (cached_lhs)) cached_lhs = (*simplify) (stmt, stmt); + + /* We couldn't find an invariant. But, callers of this + function may be able to do something useful with the phi + node. */ + if (!cached_lhs) + cached_lhs = cached_cached_lhs; } else cached_lhs = NULL; @@ -829,6 +837,215 @@ phi_args_equal_on_edges (edge e1, edge e2) return true; } +static int max_insn_count = 100; +static int max_path_count = 20; + +/* Helper function for find_path, VISITED_BBS is used to make sure we don't + fall into an infinite loop. */ + +static bool +find_thread_path_1 (basic_block start_bb, basic_block end_bb, + struct pointer_set_t *visited_bbs, + vec *&path) +{ + edge_iterator ei; + edge e; + + if (start_bb == end_bb) + { + vec_safe_push (path, start_bb); + return true; + } + + if (!pointer_set_insert (visited_bbs, start_bb)) + { + FOR_EACH_EDGE (e, ei, start_bb->succs) + if (find_thread_path_1 (e->dest, end_bb, visited_bbs, path)) + { + vec_safe_push (path, start_bb); + return true; + } + } + return false; +} + +/* Return true if there is at least one path from START_BB to END_BB. */ + +static bool +find_thread_path (basic_block start_bb, basic_block end_bb, + vec *&path) +{ + edge_iterator ei; + edge e; + struct pointer_set_t *visited_bbs; + bool p = false; + + if (start_bb == end_bb) + { + vec_safe_push (path, start_bb); + return true; + } + + visited_bbs = pointer_set_create (); + if (!pointer_set_insert (visited_bbs, start_bb)) + { + FOR_EACH_EDGE (e, ei, start_bb->succs) + if (find_thread_path_1 (e->dest, end_bb, visited_bbs, path)) + { + vec_safe_push (path, start_bb); + p = true; + break; + } + } + pointer_set_destroy (visited_bbs); + return p; +} + +/* We save the paths we want to copy in PATHS. PATHS->path_count is the + number of paths saved, PATHS->paths[i] is the list of basic blocks in + one path. Each path starts with the block where a variable is assigned + a constant value (PATHS->paths[i][0]) and ends with the control statement + block (PATHS->paths[i][PATH->path_sizes[i]-2]) and then the block + that the control statement is going to go to given the constant value + of the variable (PATH->paths[i][PATH->path_sizes[i]-1]). + + BBS_LIST[0] is the block with the control statement, + BBS_LIST[n - 1] is the block where the control statement variable + is assigned a constant value, + The entries in between make a (reverse) path between the two. + + We don't want to change BBS_LIST, we want to leave that alone and + and copy the path to PATHS->paths so that we wind up with an array + of paths that we want to update. We also want to add the block that the + control statement is going to go to on to the list so that we know + which exit from the control statement is important. */ + +static void +save_new_thread_path (vec *&path, + tree val, thread_paths *paths) +{ + int i; + int insn_count; + basic_block bb; + edge taken_edge; + gimple_stmt_iterator gsi; + int path_count = paths->path_count; + int n = path->length (); + + if (n <= 1) + return; + + if (path_count >= max_path_count) + return; + + /* Put the blocks in 'correct' order and add in where we want to go after + the control statement, We want to leave path untouched for future + calls. */ + + paths->paths[path_count] = XNEWVEC (basic_block, n + 1); + for (i = 0; i < n; i++) + paths->paths[path_count][i] = (*path)[n - i - 1]; + + taken_edge = find_taken_edge ((*path)[0], val); + paths->paths[path_count][n] = taken_edge->dest; + + paths->path_sizes[path_count] = n + 1; + + /* Count how many instructions are in the blocks we are going to + duplicate and if there are too many do not save this path + (return without incrementing PATHS->path_count). */ + + insn_count = 0; + for (i = 1; i < n; i++) + { + bb = paths->paths[path_count][i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + insn_count += 1; + } + + if (insn_count > max_insn_count) + { + /* Otherwise, we will lose this next call. */ + XDELETEVEC (paths->paths[path_count]); + return; + } + + paths->path_count++; +} + +/* CTRL_STMT is a COND_EXPR or SWITCH_EXPR statement whose controlling + expression is the variable EXPR. We trace the value of the variable back + through any phi nodes looking for places where it gets a constant + value and save the path in BBS_LIST. Then we call save_new_thread_path + to create a list of such paths. */ + +static void +find_control_statement_thread_paths (tree expr, gimple ctrl_stmt, + struct pointer_set_t *visited_phis, + thread_paths *paths, + vec *&path) +{ + gimple def_stmt; + tree var; + unsigned int i; + edge e; + edge_iterator ei; + basic_block var_bb; + int e_count; + vec *next_path; + basic_block bbx = path->last (); + + var = SSA_NAME_VAR (expr); + def_stmt = SSA_NAME_DEF_STMT (expr); + var_bb = gimple_bb (def_stmt); + + if (var == NULL || var_bb == NULL) return; + + /* We have a variable definition (var) that is defined in var_bb, + We want to put the path from var_bb to the current bb into + next_path. If there is more then one path, skip this and don't + try to do the optimization. */ + + e_count = 0; + vec_alloc (next_path, n_basic_blocks); + FOR_EACH_EDGE (e, ei, bbx->preds) + { + if (find_thread_path (var_bb, e->src, next_path)) + e_count = e_count + 1; + if (e_count > 1) + break; + } + + /* We can't proceed if more than one edge may form a path. */ + if (e_count != 1) + { + vec_free (next_path); + return; + } + + vec_safe_splice (path, next_path); + + if ((gimple_code (def_stmt) == GIMPLE_PHI) + && !pointer_set_insert (visited_phis, def_stmt)) + { + for (i = 0; i < gimple_phi_num_args (def_stmt); i++) + { + tree arg = gimple_phi_arg_def (def_stmt, i); + vec_safe_push (path, gimple_phi_arg_edge (def_stmt, i)->src); + if (arg && (TREE_CODE (arg) == INTEGER_CST)) + save_new_thread_path (path, arg, paths); + else if (arg && (TREE_CODE (arg) == SSA_NAME)) + find_control_statement_thread_paths (arg, ctrl_stmt, + visited_phis, + paths, path); + path->pop (); + } + } + + vec_safe_truncate (path, (path->length () - next_path->length ())); + vec_free (next_path); +} + /* We are exiting E->src, see if E->dest ends with a conditional jump which has a known value when reached via E. @@ -943,12 +1160,48 @@ thread_across_edge (gimple dummy_cond, register_jump_thread (e, taken_edge, NULL); return; } + else if (cond && !is_gimple_min_invariant (cond)) + { + /* Try to find paths from a control statement back through + the PHI nodes which would affect that control statement. + Record these opportunities for later. */ + thread_paths *paths = XNEW (thread_paths); + struct pointer_set_t *visited_phis; + vec *path; + vec_alloc (path, n_basic_blocks); + + paths->path_count = 0; + paths->path_sizes = XNEWVEC (int, max_path_count); + paths->paths = XNEWVEC (basic_block *, max_path_count); + + visited_phis = pointer_set_create (); + + vec_safe_push (path, e->dest); + find_control_statement_thread_paths (cond, stmt, visited_phis, + paths, path); + + pointer_set_destroy (visited_phis); + vec_free (path); + + if (paths->path_count > 0) + { + register_thread_paths (paths); + return; + } + + /* This cleanup is the responsibility of tree-ssa-threadupdate.c + if we found any paths. */ + XDELETEVEC (paths->paths); + XDELETEVEC (paths->path_sizes); + XDELETE (paths); + } } - /* We were unable to determine what out edge from E->dest is taken. However, - we might still be able to thread through successors of E->dest. This - often occurs when E->dest is a joiner block which then fans back out - based on redundant tests. + /* We were unable to determine what out edge from E->dest is taken, and we + could not find any threadable paths. However, we might still be able + to thread through successors of E->dest. This often occurs when + E->dest is a joiner block which then fans back out based on redundant + tests. If so, we'll copy E->dest and redirect the appropriate predecessor to the copy. Within the copy of E->dest, we'll thread one or more edges diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 0e4cbc9..4692241 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -169,6 +169,10 @@ struct ssa_local_info_t (original_edge, target_edge). */ static vec threaded_edges; +/* For more advanced jump threading, make a note of the whole + interesting path. */ +static vec thread_paths_vec; + /* When we start updating the CFG for threading, data necessary for jump threading is attached to the AUX field for the incoming edge. Use these macros to access the underlying structure attached to the AUX field. */ @@ -1183,6 +1187,75 @@ mark_threaded_blocks (bitmap threaded_blocks) BITMAP_FREE(tmp); } +/* Remove any edge triplet from the candidate jump-threads in threaded_edges + for which E is a member. */ + +static void +remove_edge_from_thread_candidates (edge e) +{ + unsigned int i; + + for (i = 0; i < threaded_edges.length (); i += 3) + { + if (threaded_edges[i] == e + || threaded_edges[i + 1] == e + || threaded_edges[i + 2] == e) + { + /* By doing an unordered remove in reverse order, we + maintain the intended (triplet of edges) element + ordering. */ + threaded_edges.unordered_remove (i + 2); + threaded_edges.unordered_remove (i + 1); + threaded_edges.unordered_remove (i); + /* Rescan the previous three elements. */ + i -= 3; + } + } +} + +/* Call gimple_duplicate_sese_region to duplicate the blocks in bb_list. + We free and recalculate all ssa, loop, cfg and dominance information + afterwards because the region being copied is not really + SESE and so we cannot trust gimple_duplicate_sese_region to correctly + update the dataflow information. */ + +static void +duplicate_thread_path (basic_block *bb_list, int bb_count) +{ + edge orig_edge, exit_edge; + orig_edge = find_edge (bb_list[0], bb_list[1]); + exit_edge = find_edge (bb_list[bb_count - 2], bb_list[bb_count - 1]); + bool success = gimple_duplicate_sese_region (orig_edge, exit_edge, + &bb_list[1], bb_count - 2, + NULL, 0); + + if (success) + { + /* Some edges will no longer be safe to thread accross. In the + very worst case, we will miss an optimisation opportunity + by doing this. */ + remove_edge_from_thread_candidates (orig_edge); + remove_edge_from_thread_candidates (exit_edge); + /* Clean up our mess. */ + free_dominance_info (CDI_DOMINATORS); + } +} + +/* Go through the paths saved in PATHS->paths and make copies of them. */ + +static void +copy_control_statement_thread_paths (thread_paths *paths) +{ + int i; + for (i = 0; i < paths->path_count; i++) + { + /* For each path in bbs_list_size loop through and copy each block in + the path (except the first on where the constant is assigned and + the final one where the switch statement goes to. */ + if (!single_pred_p (paths->paths[i][1])) + duplicate_thread_path (paths->paths[i], paths->path_sizes[i]); + } +} /* Walk through all blocks and thread incoming edges to the appropriate outgoing edge for each edge pair recorded in THREADED_EDGES. @@ -1204,6 +1277,8 @@ thread_through_all_blocks (bool may_peel_loop_headers) bitmap threaded_blocks; struct loop *loop; loop_iterator li; + int j; + thread_paths *paths; /* We must know about loops in order to preserve them. */ gcc_assert (current_loops != NULL); @@ -1211,6 +1286,20 @@ thread_through_all_blocks (bool may_peel_loop_headers) if (!threaded_edges.exists ()) return false; + /* Threading the paths in thread_paths can cause major damage to + the dominance information, ssa names and loop forms. Do these + paths first. */ + while (!thread_paths_vec.is_empty ()) + { + paths = thread_paths_vec.pop (); + copy_control_statement_thread_paths (paths); + XDELETEVEC (paths->path_sizes); + for (j = 0; j < paths->path_count; j++) + XDELETEVEC (paths->paths[j]); + XDELETEVEC (paths->paths); + XDELETE (paths); + } + threaded_blocks = BITMAP_ALLOC (NULL); memset (&thread_stats, 0, sizeof (thread_stats)); @@ -1283,3 +1372,20 @@ register_jump_thread (edge e, edge e2, edge e3) threaded_edges.safe_push (e2); threaded_edges.safe_push (e3); } + +/* Register thread_paths found through analysis of expressions + used by control statements. */ + +void +register_thread_paths (thread_paths *tps) +{ + if (!thread_paths_vec.exists ()) + thread_paths_vec.create (15); + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + " Registering control statement jump thread\n"); + + thread_paths_vec.safe_push (tps); +} +