From patchwork Sun Nov 23 22:22:07 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Pop X-Patchwork-Id: 413523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 325F414012F for ; Mon, 24 Nov 2014 09:22:26 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=ZOGpFcHDYcKJmZ1VK yyTDQgKzk6WqbEqmjxNkGfy6upryj9ZcERXDW/mCzvGYMcAPr2Vv2+Oda0FMGDHO gwIKNqo10Lv1K9dR3W6s5hmDJwhPB3iT5H3QDId2wRrfZgYKHayNavxIB5gZTGn7 vXNMGQ5NZpuIsg+nvt+stcr9oM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=K42Ao3BbiBmTG2Tb9dvPuzV KNBc=; b=anJEzLN2XCu4Fm6LNAmsFt7vqTPWyGUWNw/E6wF4TjmElk5zhptA63S hDxW9QmmkMssGJW3/brb0E5XrUq4VWG82vi9ucZdHvLGcf9DCJQkoIdUvAxZq6Z/ SufPBcn3Ba4SDmY5NGEuMX7Xwnw6oJTcbGGdo7rwJwTPV41Lh7aM= Received: (qmail 5807 invoked by alias); 23 Nov 2014 22:22:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5797 invoked by uid 89); 23 Nov 2014 22:22:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ig0-f170.google.com Received: from mail-ig0-f170.google.com (HELO mail-ig0-f170.google.com) (209.85.213.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 23 Nov 2014 22:22:12 +0000 Received: by mail-ig0-f170.google.com with SMTP id r2so4193029igi.3 for ; Sun, 23 Nov 2014 14:22:10 -0800 (PST) X-Received: by 10.50.3.8 with SMTP id 8mr8768422igy.45.1416781330784; Sun, 23 Nov 2014 14:22:10 -0800 (PST) Received: from f1.c.bardezibar.internal (81.37.148.146.bc.googleusercontent.com. [146.148.37.81]) by mx.google.com with ESMTPSA id mx10sm3567417igb.21.2014.11.23.14.22.09 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Sun, 23 Nov 2014 14:22:09 -0800 (PST) Date: Sun, 23 Nov 2014 22:22:07 +0000 From: Sebastian Pop To: Jeff Law Cc: Richard Biener , James Greenhalgh , Steve Ellcey , GCC Patches Subject: Re: [Patch] Improving jump-thread pass for PR 54742 Message-ID: <20141123222207.GA24073@f1.c.bardezibar.internal> References: <20140821094109.GA20536@arm.com> <53FB73C0.7090507@redhat.com> <20140926201451.GA16704@f1.c.bardezibar.internal> <20141026213433.GA22140@f1.c.bardezibar.internal> <20141111011404.GA23088@f1.c.bardezibar.internal> <20141118221933.GA7298@f1.c.bardezibar.internal> <5470E495.7070601@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5470E495.7070601@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Jeff Law wrote: > >PS: I have run some perf analysis with the patch: > >- on a bootstrap of GCC I see 3209 FSM jump threads > >- libpng and libjpeg contain FSM jump threads, the perf increase is in the 1% > > (measured on simulators and reduced data sets) > >- coremark gets jump threaded (as expected) > >- I'm setting up the llvm test-suite and I will report perf differences > So that's *far* more jump threads than I would expect this to find > in a bootstrap of GCC -- like 3 orders of magnitude more than I'd > expect to find. The second patch attached limits the search for FSM jump threads to loops. With that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap (and 424 jump threads on powerpc64-linux bootstrap.) > I haven't dug deep, but the first level analysis is not encouraging. > > Basically I used the trunk compiler with and without your patch to > build gcc-4.7.3's cc1 (4.7.3 simply because that's what I last used > this testing framework). So that gives me two cc1s that I then use > to compile a bunch of .i files under valgrind's (cachegrind) > control. > > valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ...... > > That gives me two hunks of data for each input file I test. > Specifically I get the dynamic number of instructions and the > dynamic number of branches executed. > > For jump threading those values correspond directly to the effect > we're looking for -- a reduction in dynamic conditional jumps and a > reduction in dynamic instructions executed. Typically the change in > dynamic instructions executed is 2-3X the change in dynamic > conditional jumps -- which makes sense as removing the conditional > jump usually means we remove a comparison and possibly some setup > code as well. > > Consistently with your patch those values get worse. Across the > entire set of .i files I get > > For the trunk: > > instructions:1339016494968 > branches: 243568982489 > > With your patch: > > instructions:1339739533291 > branches: 243806615986 > > > So that's 723038323 more instructions and 237633497 more branches > after installing your patch. While we're looking a just under .1% > regression in dynamic branches, that's a terrible result for this > work. One of the reasons I think we see more branches is that in sese region copying we do not use the knowledge of the value of the condition for the last branch in a jump-thread path: we rely on other propagation passes to remove the branch. The last attached patch adds: /* Remove the last branch in the jump thread path. */ remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest); Please let me know if the attached patches are producing better results on gcc. I rebased the original patch on trunk and all patches bootstrapped together on x86_64-linux and powerpc64-linux. Thanks, Sebastian From b9b6155099d81b5ee6322e8bba2e3ba5d4f00b6e Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Sun, 23 Nov 2014 10:52:11 -0600 Subject: [PATCH 4/4] make copied region single entry and remove last condition stmt --- gcc/tree-cfg.c | 2 +- gcc/tree-cfg.h | 1 + gcc/tree-ssa-threadupdate.c | 151 +++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 151 insertions(+), 3 deletions(-) diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 6d96c52..ffa5162 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -2666,7 +2666,7 @@ reinstall_phi_args (edge new_edge, edge old_edge) near its "logical" location. This is of most help to humans looking at debugging dumps. */ -static basic_block +basic_block split_edge_bb_loc (edge edge_in) { basic_block dest = edge_in->dest; diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h index 626e973..51f0899 100644 --- a/gcc/tree-cfg.h +++ b/gcc/tree-cfg.h @@ -67,6 +67,7 @@ extern void verify_gimple_in_cfg (struct function *, bool); extern tree gimple_block_label (basic_block); extern void add_phi_args_after_copy_bb (basic_block); extern void add_phi_args_after_copy (basic_block *, unsigned, edge); +extern basic_block split_edge_bb_loc (edge); extern bool gimple_duplicate_sese_region (edge, edge, basic_block *, unsigned, basic_block *, bool); extern bool gimple_duplicate_sese_tail (edge, edge, basic_block *, unsigned, diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index dd2b518..3ee2117 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2318,6 +2318,153 @@ bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED) return false; } +/* Duplicates a Single Entry Multiple Exit REGION (set of N_REGION basic + blocks). The ENTRY edge is redirected to the duplicate of the region. If + REGION is not a Single Entry region, ignore any incoming edges other than + ENTRY: this makes the copied region a Single Entry region. + + Remove the last conditional statement in the last basic block in the REGION, + and create a single fallthru edge pointing to the same destination as the + EXIT edge. + + The new basic blocks are stored to REGION_COPY in the same order as they had + in REGION, provided that REGION_COPY is not NULL. + + Returns false if it is unable to copy the region, true otherwise. */ + +static bool +duplicate_seme_region (edge entry, edge exit, + basic_block *region, unsigned n_region, + basic_block *region_copy) +{ + unsigned i; + bool free_region_copy = false, copying_header = false; + struct loop *loop = entry->dest->loop_father; + edge exit_copy; + edge redirected; + int total_freq = 0, entry_freq = 0; + gcov_type total_count = 0, entry_count = 0; + + if (!can_copy_bbs_p (region, n_region)) + return false; + + /* Some sanity checking. Note that we do not check for all possible + missuses of the functions. I.e. if you ask to copy something weird, + it will work, but the state of structures probably will not be + correct. */ + for (i = 0; i < n_region; i++) + { + /* We do not handle subloops, i.e. all the blocks must belong to the + same loop. */ + if (region[i]->loop_father != loop) + return false; + } + + initialize_original_copy_tables (); + + if (copying_header) + set_loop_copy (loop, loop_outer (loop)); + else + set_loop_copy (loop, loop); + + if (!region_copy) + { + region_copy = XNEWVEC (basic_block, n_region); + free_region_copy = true; + } + + if (entry->dest->count) + { + total_count = entry->dest->count; + entry_count = entry->count; + /* Fix up corner cases, to avoid division by zero or creation of negative + frequencies. */ + if (entry_count > total_count) + entry_count = total_count; + } + else + { + total_freq = entry->dest->frequency; + entry_freq = EDGE_FREQUENCY (entry); + /* Fix up corner cases, to avoid division by zero or creation of negative + frequencies. */ + if (total_freq == 0) + total_freq = 1; + else if (entry_freq > total_freq) + entry_freq = total_freq; + } + + copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop, + split_edge_bb_loc (entry), 0); + if (total_count) + { + scale_bbs_frequencies_gcov_type (region, n_region, + total_count - entry_count, + total_count); + scale_bbs_frequencies_gcov_type (region_copy, n_region, entry_count, + total_count); + } + else + { + scale_bbs_frequencies_int (region, n_region, total_freq - entry_freq, + total_freq); + scale_bbs_frequencies_int (region_copy, n_region, entry_freq, total_freq); + } + +#ifdef ENABLE_CHECKING + /* Make sure no edge other than ENTRY is entering the copied region. */ + for (i = 0; i < n_region; i++) + { + edge e; + edge_iterator ei; + basic_block bb = region_copy[i]; + + if (single_pred_p (bb)) + continue; + + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ei_next (&ei)) + { + basic_block x = e->src; + bool found = false; + + for (unsigned j = 0; j < n_region; j++) + if (x == region_copy[j]) + { + found = true; + break; + } + + gcc_assert (found); + } + } +#endif + + /* Remove the last branch in the jump thread path. */ + remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest); + edge e = make_edge (region_copy[n_region - 1], exit->dest, EDGE_FALLTHRU); + + if (e) { + rescan_loop_exit (e, true, false); + e->probability = REG_BR_PROB_BASE; + e->count = region_copy[n_region - 1]->count; + //copy_phi_args (e->dest, rd->path->last ()->e, e, rd->path, idx); + } + + /* Redirect the entry and add the phi node arguments. */ + redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest)); + gcc_assert (redirected != NULL); + flush_pending_stmts (entry); + + /* Add the other PHI node arguments. */ + add_phi_args_after_copy (region_copy, n_region, NULL); + + if (free_region_copy) + free (region_copy); + + free_original_copy_tables (); + return true; +} + /* Walk through all blocks and thread incoming edges to the appropriate outgoing edge for each edge pair recorded in THREADED_EDGES. @@ -2363,8 +2510,8 @@ thread_through_all_blocks (bool may_peel_loop_headers) for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - bool success = gimple_duplicate_sese_region (entry, exit, region, - len - 1, NULL, 0); + bool success = duplicate_seme_region (entry, exit, region, + len - 1, NULL); if (success) { dump_jump_thread_path (stderr, *path, false); -- 1.9.1