From patchwork Fri Sep 18 15:36:29 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Segher Boessenkool X-Patchwork-Id: 519395 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 262A2140E58 for ; Sat, 19 Sep 2015 01:37:14 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=d786Z88r; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=NIviFbm9tpE8 dIFBfu+R0OlMdPM/hZ0Xn4mW5ihmINL4cQKPVUsPhQ0UQzf671TbVMWhDW5PlPtz 1G1oztcop9/4VLnSxzhxl30M/tR6uPh6Y22J6+BlRSVxBTnJuK691RZbW6pStH+m f+S/M+jML5V7hOgNJRAowJhfUE6iK9E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=bL0fgE+9Bz9rumgTnf P7wPmwCdA=; b=d786Z88r2v4+F/vpIoaOG2imWVtTcf+HToEJ3iqkP192BX3wVV 4tdjXgq1x0Nw/uJR1UD1K67h1TW/8DumQUET+sV1vQ14XbJlvGgFuTwA9ud5VTEu YNBfKvEV7RCEaCMCH0QvUv8LADBc0EVCBNlosededhLy16wivgYgUt5H8= Received: (qmail 130341 invoked by alias); 18 Sep 2015 15:37:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 130331 invoked by uid 89); 18 Sep 2015 15:37:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.0 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: gcc1-power7.osuosl.org Received: from gcc1-power7.osuosl.org (HELO gcc1-power7.osuosl.org) (140.211.15.137) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 18 Sep 2015 15:37:03 +0000 Received: from gcc1-power7.osuosl.org (localhost [127.0.0.1]) by gcc1-power7.osuosl.org (8.14.6/8.14.6) with ESMTP id t8IFawiE057385; Fri, 18 Sep 2015 08:36:59 -0700 Received: (from segher@localhost) by gcc1-power7.osuosl.org (8.14.6/8.14.6/Submit) id t8IFagbY055393; Fri, 18 Sep 2015 08:36:42 -0700 From: Segher Boessenkool To: gcc-patches@gcc.gnu.org Cc: Segher Boessenkool Subject: [PATCH] shrink-wrap: Handle multiple predecessors of prologue Date: Fri, 18 Sep 2015 08:36:29 -0700 Message-Id: <52f14532eb742ac8d878a185a46a88da7b0326eb.1442588483.git.segher@kernel.crashing.org> X-IsSubscribed: yes The caller of try_shrink_wrapping wants to be returned a single edge to put the prologue on. To make that work even if there are multiple edges (all pointing to the PRO block) that need the prologue, add a new block that becomes the destination of all such edges, and then jumps to PRO. In the general case, some edges to PRO will need to be redirected, and not all edges *can* be redirected. This adds a can_get_prologue function that detects such cases. This then happily can also handle the "prologue clobbers some reg that is live on the edge we want to insert it on" case. Not all (if any?) EDGE_CROSSING edges can be redirected, so handle those the same as EDGE_COMPLEX edges. Maybe they should *be* EDGE_COMPLEX? The "clobber" test tests all registers live in, which is a bit pessimistic (only those live on edges that get the prologue need to be considered). This is the same test as was there before; I haven't measured what the impact of this suboptimality is. Bootstrapped and tested on powerpc64-linux. is this okay for mainline? Segher 2015-09-18 Segher Boessenkool * function.c (thread_prologue_and_epilogue_insns): Delete orig_entry_edge argument to try_shrink_wrapping. * shrink-wrap.c (can_get_prologue): New function. (can_dup_for_shrink_wrapping): Also handle EDGE_CROSSING. (try_shrink_wrapping): Delete orig_entry_edge argument. Use can_get_prologue where needed. Remove code that finds a single edge for the prologue. Remove code that tests if any reg clobbered by the prologue is live on the prologue edge. Remove code that finds the new prologue edge after duplicating blocks. Make a new prologue block and edge. * shrink-wrap.h (try_shrink_wrapping): Delete orig_entry_edge argument. --- gcc/function.c | 2 +- gcc/shrink-wrap.c | 170 ++++++++++++++++++++++++++++++------------------------ gcc/shrink-wrap.h | 4 +- 3 files changed, 99 insertions(+), 77 deletions(-) diff --git a/gcc/function.c b/gcc/function.c index bb75b1c..1095465 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -6144,7 +6144,7 @@ thread_prologue_and_epilogue_insns (void) prologue/epilogue is emitted only around those parts of the function that require it. */ - try_shrink_wrapping (&entry_edge, orig_entry_edge, &bb_flags, prologue_seq); + try_shrink_wrapping (&entry_edge, &bb_flags, prologue_seq); if (split_prologue_seq != NULL_RTX) { diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index 1387594..d2d665f 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -462,6 +462,30 @@ prepare_shrink_wrap (basic_block entry_block) } } +/* Return whether basic block PRO can get the prologue. It can not if it + has incoming complex edges that need a prologue inserted (we make a new + block for the prologue, so those edges would need to be redirected, which + does not work). It also can not if there exist registers live on entry + to PRO that are clobbered by the prologue. */ + +static bool +can_get_prologue (basic_block pro, HARD_REG_SET prologue_clobbered) +{ + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, pro->preds) + if (e->flags & (EDGE_COMPLEX | EDGE_CROSSING) + && !dominated_by_p (CDI_DOMINATORS, e->src, pro)) + return false; + + HARD_REG_SET live; + REG_SET_TO_HARD_REG_SET (live, df_get_live_in (pro)); + if (hard_reg_set_intersect_p (live, prologue_clobbered)) + return false; + + return true; +} + /* Return whether we can duplicate basic block BB for shrink wrapping. We cannot if the block cannot be duplicated at all, or if any of its incoming edges are complex and come from a block that does not require a prologue @@ -478,7 +502,7 @@ can_dup_for_shrink_wrapping (basic_block bb, basic_block pro, unsigned max_size) edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, bb->preds) - if (e->flags & EDGE_COMPLEX + if (e->flags & (EDGE_COMPLEX | EDGE_CROSSING) && !dominated_by_p (CDI_DOMINATORS, e->src, pro)) return false; @@ -577,14 +601,13 @@ fix_fake_fallthrough_edge (edge e) (bb 4 is duplicated to 5; the prologue is inserted on the edge 5->3). ENTRY_EDGE is the edge where the prologue will be placed, possibly - changed by this function. ORIG_ENTRY_EDGE is the edge where it - would be placed without shrink-wrapping. BB_WITH is a bitmap that, - if we do shrink-wrap, will on return contain the interesting blocks - that run with prologue. PROLOGUE_SEQ is the prologue we will insert. */ + changed by this function. BB_WITH is a bitmap that, if we do shrink- + wrap, will on return contain the interesting blocks that run with + prologue. PROLOGUE_SEQ is the prologue we will insert. */ void -try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, - bitmap_head *bb_with, rtx_insn *prologue_seq) +try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, + rtx_insn *prologue_seq) { /* If we cannot shrink-wrap, are told not to shrink-wrap, or it makes no sense to shrink-wrap: then do not shrink-wrap! */ @@ -631,6 +654,9 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, IOR_HARD_REG_SET (prologue_used, this_used); note_stores (PATTERN (insn), record_hard_reg_sets, &prologue_clobbered); } + CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); + if (frame_pointer_needed) + CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); /* Find out what registers are set up by the prologue; any use of these cannot happen before the prologue. */ @@ -695,8 +721,9 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, pro->index); /* Now see if we can put the prologue at the start of PRO. Putting it - there might require duplicating a block that cannot be duplicated; - if so, try again with the immediate dominator of PRO, and so on. + there might require duplicating a block that cannot be duplicated, + or in some cases we cannot insert the prologue there at all. If PRO + wont't do, try again with the immediate dominator of PRO, and so on. The blocks that need duplicating are those reachable from PRO but not dominated by it. We keep in BB_WITH a bitmap of the blocks @@ -714,6 +741,16 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, while (!vec.is_empty () && pro != entry) { + while (pro != entry && !can_get_prologue (pro, prologue_clobbered)) + { + gcc_assert (pro != entry); + + pro = get_immediate_dominator (CDI_DOMINATORS, pro); + + bitmap_set_bit (bb_with, pro->index); + vec.quick_push (pro); + } + basic_block bb = vec.pop (); if (!can_dup_for_shrink_wrapping (bb, pro, max_grow_size)) while (!dominated_by_p (CDI_DOMINATORS, bb, pro)) @@ -746,14 +783,18 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, { calculate_dominance_info (CDI_POST_DOMINATORS); + basic_block last_ok = pro; while (pro != entry) { basic_block pre = get_immediate_dominator (CDI_DOMINATORS, pro); - if (dominated_by_p (CDI_POST_DOMINATORS, pre, pro)) - pro = pre; - else + if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro)) break; + + pro = pre; + if (can_get_prologue (pro, prologue_clobbered)) + last_ok = pro; } + pro = last_ok; free_dominance_info (CDI_POST_DOMINATORS); } @@ -762,62 +803,26 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, fprintf (dump_file, "Bumping back to anticipatable blocks, PRO is now %d\n", pro->index); - /* If there is more than one predecessor of PRO not dominated by PRO, fail. - Also find that single edge that leads to PRO. */ - - bool multi = false; - edge the_edge = 0; - FOR_EACH_EDGE (e, ei, pro->preds) - if (!dominated_by_p (CDI_DOMINATORS, e->src, pro)) - { - if (the_edge) - multi = true; - else - the_edge = e; - } - - if (multi) + if (pro == entry) { - the_edge = orig_entry_edge; - - if (dump_file) - fprintf (dump_file, "More than one candidate edge.\n"); + free_dominance_info (CDI_DOMINATORS); + return; } - if (dump_file) - fprintf (dump_file, "Found candidate edge for shrink-wrapping, %d->%d.\n", - the_edge->src->index, the_edge->dest->index); - - *entry_edge = the_edge; - /* Compute what fraction of the frequency and count of the blocks that run both with and without prologue are for running with prologue. This gives the correct answer for reducible flow graphs; for irreducible flow graphs our profile is messed up beyond repair anyway. */ - int num = (*entry_edge)->probability; - int den = REG_BR_PROB_BASE; - - if (*entry_edge == orig_entry_edge) - goto out; + gcov_type num = 0; + gcov_type den = 0; - /* Test whether the prologue is known to clobber any register - (other than FP or SP) which are live on the edge. */ - - HARD_REG_SET live_on_edge; - CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); - if (frame_pointer_needed) - CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); - REG_SET_TO_HARD_REG_SET (live_on_edge, - df_get_live_in ((*entry_edge)->dest)); - if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) - { - *entry_edge = orig_entry_edge; - if (dump_file) - fprintf (dump_file, - "Shrink-wrapping aborted due to clobber.\n"); - goto out; - } + FOR_EACH_EDGE (e, ei, pro->preds) + if (!dominated_by_p (CDI_DOMINATORS, e->src, pro)) + { + num += EDGE_FREQUENCY (e); + den += e->src->frequency; + } /* All is okay, so do it. */ @@ -852,20 +857,6 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, dup->count -= bb->count; } - /* Change ENTRY_EDGE, if its src is duplicated. Do this first, before - the redirects have had a chance to create new blocks on the edge we - want to use for the prologue, which makes us not find it. */ - - gcc_assert (!dominated_by_p (CDI_DOMINATORS, (*entry_edge)->src, pro)); - - if (bitmap_bit_p (bb_with, (*entry_edge)->src->index)) - { - basic_block src = (basic_block) (*entry_edge)->src->aux; - FOR_EACH_EDGE (e, ei, src->succs) - if (e->dest == pro) - *entry_edge = e; - } - /* Now change the edges to point to the copies, where appropriate. */ FOR_EACH_BB_FN (bb, cfun) @@ -923,7 +914,38 @@ try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, emit_barrier_after_bb (e->src); } -out: + /* Finally, we want a single edge to put the prologue on. Make a new + block before the PRO block; the edge beteen them is the edge we want. + Then redirect those edges into PRO that come from blocks without the + prologue, to point to the new block instead. The new prologue block + is put at the end (instead of right before the PRO block) so we do not + have to worry about any fallthrough edges. */ + + basic_block new_bb = create_empty_bb (EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb); + BB_COPY_PARTITION (new_bb, pro); + if (dump_file) + fprintf (dump_file, "Made prologue block %d\n", new_bb->index); + + for (ei = ei_start (pro->preds); (e = ei_safe_edge (ei)); ) + { + if (bitmap_bit_p (bb_with, e->src->index) + || dominated_by_p (CDI_DOMINATORS, e->src, pro)) + { + ei_next (&ei); + continue; + } + + new_bb->count += RDIV (e->src->count * e->probability, REG_BR_PROB_BASE); + new_bb->frequency += EDGE_FREQUENCY (e); + + redirect_edge_and_branch_force (e, new_bb); + if (dump_file) + fprintf (dump_file, "Redirected edge from %d\n", e->src->index); + } + + *entry_edge = make_single_succ_edge (new_bb, pro, EDGE_FALLTHRU); + force_nonfallthru (*entry_edge); + free_dominance_info (CDI_DOMINATORS); } diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h index 6819901..527a57a 100644 --- a/gcc/shrink-wrap.h +++ b/gcc/shrink-wrap.h @@ -24,8 +24,8 @@ along with GCC; see the file COPYING3. If not see /* In shrink-wrap.c. */ extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET); -extern void try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge, - bitmap_head *bb_flags, rtx_insn *prologue_seq); +extern void try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_flags, + rtx_insn *prologue_seq); extern edge get_unconverted_simple_return (edge, bitmap_head, vec *, rtx_insn **); extern void convert_to_simple_return (edge entry_edge, edge orig_entry_edge,