From patchwork Wed Dec 13 16:12:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filip Kastl X-Patchwork-Id: 1875774 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sr0qz46W3z20LX for ; Thu, 14 Dec 2023 03:12:31 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EAAB5385C6DC for ; Wed, 13 Dec 2023 16:12:28 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id E4FD83858298 for ; Wed, 13 Dec 2023 16:12:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4FD83858298 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E4FD83858298 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702483937; cv=none; b=BO/drTtXQcwDxnZ6hdqMA8nPD1HvH92sXvUmvVisZ93pjyOrq+EvJNZmdgZNPFzRxpPBDt4GIXFpFvpb0o0Ukhpln2YSpJvaHArfp542QggYIUUaLfHwgHx/nxTtlacPDm3OB3L4sBFVP6JN5v3Ruq0kXglW8iWfl1/HzByAd3w= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702483937; c=relaxed/simple; bh=C2oYn8P8F9a62dMeOp+liiSpgVfaTBBKk1DXS/eFrJs=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=EBAdFPzYvk+mIfMvkMt3y4ooi0SWFlILzlfp6JVLxH0QsgDPmaXqeSoq+TwOY6Dp+8yGIOV+OMG9d0QfE7zaOl582OxjeqXlNmEia2xHKHHaJPzELo8Ko0IVu8zxToeZpb1L4MEMMnl4c5BITSj1aJxZ/c/Dxh3IcGyxj47Jtho= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.dmz-prg2.suse.org (imap2.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 19DA61F443; Wed, 13 Dec 2023 16:12:12 +0000 (UTC) Received: from imap2.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap2.dmz-prg2.suse.org (Postfix) with ESMTPS id 0E2F31391D; Wed, 13 Dec 2023 16:12:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap2.dmz-prg2.suse.org with ESMTPSA id uRBzA9zXeWUFYgAAn2gu4w (envelope-from ); Wed, 13 Dec 2023 16:12:12 +0000 Date: Wed, 13 Dec 2023 17:12:11 +0100 From: Filip Kastl To: gcc-patches@gcc.gnu.org Cc: Richard Biener , hubicka@ucw.cz Subject: [PATCH v4] A new copy propagation and PHI elimination pass Message-ID: References: <571578n9-pn64-qqqq-q59s-866548669143@fhfr.qr> <39489or3-7659-5306-4133-6p3n2pr58p99@fhfr.qr> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Level: X-Spam-Score: -4.00 X-Spam-Level: Authentication-Results: smtp-out2.suse.de; none X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Spam-Score: -4.00 X-Rspamd-Queue-Id: 19DA61F443 X-Spam-Flag: NO X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org > > > Hi, > > > > > > this is a patch that I submitted two months ago as an RFC. I added some polish > > > since. > > > > > > It is a new lightweight pass that removes redundant PHI functions and as a > > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able > > > to remove usually more than 5% of all PHI functions when run among early passes > > > (sometimes even 13% or more). Those are mostly PHI functions that would be > > > later optimized away but with this pass it is possible to remove them early > > > enough so that they don't get streamed when runing LTO (and also potentially > > > inlined at multiple places). It is also able to remove some redundant PHIs > > > that otherwise would still be present during RTL expansion. > > > > > > Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus > > > with and without this patch. These are the sizes of .debug_info and > > > .debug_loclists > > > > > > .debug_info without patch 181694311 > > > .debug_info with patch 181692320 > > > +0.0011% change > > > > > > .debug_loclists without patch 47934753 > > > .debug_loclists with patch 47934966 > > > -0.0004% change > > > > > > I wanted to use dwlocstat to compare debug coverages but didn't manage to get > > > the program working on my machine sadly. Hope this suffices. Seems to me that > > > my patch doesn't have a significant impact on debug info. > > > > > > Bootstraped and tested* on x86_64-pc-linux-gnu. > > > > > > * One testcase (pr79691.c) did regress. However that is because the test is > > > dependent on a certain variable not being copy propagated. I will go into more > > > detail about this in a reply to this mail. > > > > > > Ok to commit? > > > > This is a second version of the patch. In this version, I modified the > > pr79691.c testcase so that it works as intended with other changes from the > > patch. > > > > The pr79691.c testcase checks that we get constants from snprintf calls and > > that they simplify into a single constant. The testcase doesn't account for > > the fact that this constant may be further copy propagated which is exactly > > what happens with this patch applied. > > > > Bootstrapped and tested on x86_64-pc-linux-gnu. > > > > Ok to commit? > > This is the third version of the patch. In this version, I adressed most of > Richards remarks about the second version. Here is a summary of changes I made: > > - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc > - Use simple_dce_from_worklist to remove propagated statements > - Use existing replace_uses API instead of reinventing it > - This allowed me to get rid of some now redundant cleanup code > - Encapsulate the SCC finding into a class > - Rework stmt_may_generate_copy to get rid of redundant checks > - Add check that PHI doesn't contain two non-SSA-name values to > stmt_may_generate_copy > - Regarding alignment and value ranges in stmt_may_generate_copy: For now use > the conservative check that Richard suggested > - Index array of vertices that SCC discovery uses by SSA name version numbers > instead of numbering statements myself. > > > I didn't make any changes based on these remarks: > > 1 It might be nice to optimize SCCs of size 1 somehow, not sure how > many times these appear - possibly prevent them from even entering > the SCC discovery? > > It would be nice. But the only way to do this that I see right now is to first > propagate SCCs of size 1 and then the rest. This would mean adding a new copy > propagation procedure. It wouldn't be a trivial procedure. Efficiency of the > pass relies on having SCCs topogically sorted so this procedure would have to > implement some topological sort algorithm. > > This could be done. It could save allocating some vec<>s (right now, SCCs of > size 1 are represented by a vec<> with a single element). But is it worth it to > optimize the pass this way right now? If possible, I'd like to see that the > pass works and sort out any problems people encounter with it before I start > optimizing it. > > 2 Instead of collecting all stmts that may generate a copy at the beginning of > the pass into a vec<>, let the SCC discovery check that statements may > generate a copy on the fly. > > This would be a big change to the pass, it would require a lot of reworking. > I'm also not sure if this would help reduce the number of allocated vec<>s that > much because I'll still want to represent SCCs by vec<>s. > > Again - its possible I'll want to rework the pass in this way in the future but > I'd like to leave it as it is for now. > > 3 Add a comment saying that the pass is doing optimistic copy propagation > > I don't think the pass works in an optimistic way. It doesn't assume that all > variables are copies of each other at any point. It instead identifies copy > statements (or PHI SCCs that act as copy statements) and then for each of these > it propagates: By that I mean if a copy statement says that _3 is a copy of _2, > then it replaces all uses of _3 by _2. > > But its possible that I still misinterpret what 'optimistic' means. If > mentioning that it works in an optimistic way would help readability of the > pass, I'm happy to add that comment. > > Richard, is the patch ok as it is now? Or do you think that making changes 1, 2 > or 3 is necessary? Or are there any other issues which should be adressed? > > Filip This is the fourth version of this patch. Changes in this version: - Disabled the pass for -Og - This meant that I could revert my changes to gcc.dg/tree-ssa/pr79691.c - Fixed formatting of the patch (like braces around a single-statment if body) - compute_sccs (auto_vec<>...) -> compute_sccs (vec<> ...) - auto_vec worklist; worklist = ...; -> auto_vec worklist = ...; I've also checked that there are no memory leak issues. Valgrind doesn't detect any aditional leaks when running with sccopy vs running without sccopy: [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 ==27363== Memcheck, a memory error detector ==27363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==27363== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==27363== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 ==27363== ==27363== ==27363== HEAP SUMMARY: ==27363== in use at exit: 174,234 bytes in 92 blocks ==27363== total heap usage: 408 allocs, 316 frees, 228,050 bytes allocated ==27363== ==27363== LEAK SUMMARY: ==27363== definitely lost: 5,935 bytes in 23 blocks ==27363== indirectly lost: 82 bytes in 5 blocks ==27363== possibly lost: 0 bytes in 0 blocks ==27363== still reachable: 168,217 bytes in 64 blocks ==27363== of which reachable via heuristic: ==27363== newarray : 1,544 bytes in 1 blocks ==27363== suppressed: 0 bytes in 0 blocks ==27363== Rerun with --leak-check=full to see details of leaked memory ==27363== ==27363== For lists of detected and suppressed errors, rerun with: -s ==27363== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2 ==27372== Memcheck, a memory error detector ==27372== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==27372== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==27372== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2 ==27372== cc1: note: disable pass tree-sccopy1 for functions in the range of [0, 4294967295] cc1: note: disable pass tree-sccopy2 for functions in the range of [0, 4294967295] ==27372== ==27372== HEAP SUMMARY: ==27372== in use at exit: 174,282 bytes in 92 blocks ==27372== total heap usage: 408 allocs, 316 frees, 228,674 bytes allocated ==27372== ==27372== LEAK SUMMARY: ==27372== definitely lost: 5,935 bytes in 23 blocks ==27372== indirectly lost: 82 bytes in 5 blocks ==27372== possibly lost: 0 bytes in 0 blocks ==27372== still reachable: 168,265 bytes in 64 blocks ==27372== of which reachable via heuristic: ==27372== newarray : 1,544 bytes in 1 blocks ==27372== suppressed: 0 bytes in 0 blocks ==27372== Rerun with --leak-check=full to see details of leaked memory ==27372== ==27372== For lists of detected and suppressed errors, rerun with: -s ==27372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Richard, I wrote down the bitmap vs hash_set optimizations you suggested so that I can work on the pass more later and optimize it. I didn't yet bootstrap this version. I just regtested it. Will bootstrap and regtest on the most current commit. Once that is successfully done, is the pass OK to be pushed to main? Filip -- >8 -- This patch adds the strongly-connected copy propagation (SCCOPY) pass. It is a lightweight GIMPLE copy propagation pass that also removes some redundant PHI statements. It handles degenerate PHIs, e.g.: _5 = PHI <_1>; _6 = PHI <_6, _6, _1, _1>; _7 = PHI <16, _7>; // Replaces occurences of _5 and _6 by _1 and _7 by 16 It also handles more complicated situations, e.g.: _8 = PHI <_9, _10>; _9 = PHI <_8, _10>; _10 = PHI <_8, _9, _1>; // Replaces occurences of _8, _9 and _10 by _1 gcc/ChangeLog: * Makefile.in: Added sccopy pass. * passes.def: Added sccopy pass before LTO streaming and before RTL expansion. * tree-pass.h (make_pass_sccopy): Added sccopy pass. * gimple-ssa-sccopy.cc: New file. gcc/testsuite/ChangeLog: * gcc.dg/sccopy-1.c: New test. Signed-off-by: Filip Kastl --- gcc/Makefile.in | 1 + gcc/gimple-ssa-sccopy.cc | 680 ++++++++++++++++++++++++++++++++ gcc/passes.def | 2 + gcc/testsuite/gcc.dg/sccopy-1.c | 78 ++++ gcc/tree-pass.h | 1 + 5 files changed, 762 insertions(+) create mode 100644 gcc/gimple-ssa-sccopy.cc create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 753f2f36618..2e6f2fef1ba 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1497,6 +1497,7 @@ OBJS = \ gimple-ssa-backprop.o \ gimple-ssa-isolate-paths.o \ gimple-ssa-nonnull-compare.o \ + gimple-ssa-sccopy.o \ gimple-ssa-split-paths.o \ gimple-ssa-store-merging.o \ gimple-ssa-strength-reduction.o \ diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc new file mode 100644 index 00000000000..ac5ec32eb32 --- /dev/null +++ b/gcc/gimple-ssa-sccopy.cc @@ -0,0 +1,680 @@ +/* Strongly-connected copy propagation pass for the GNU compiler. + Copyright (C) 2023 Free Software Foundation, Inc. + Contributed by Filip Kastl + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "gimple-iterator.h" +#include "vec.h" +#include "hash-set.h" +#include +#include "ssa-iterators.h" +#include "gimple-fold.h" +#include "gimplify.h" +#include "tree-cfg.h" +#include "tree-eh.h" +#include "builtins.h" +#include "tree-ssa-dce.h" +#include "fold-const.h" + +/* Strongly connected copy propagation pass. + + This is a lightweight copy propagation pass that is also able to eliminate + redundant PHI statements. The pass considers the following types of copy + statements: + + 1 An assignment statement with a single argument. + + _3 = _2; + _4 = 5; + + 2 A degenerate PHI statement. A degenerate PHI is a PHI that only refers to + itself or one other value. + + _5 = PHI <_1>; + _6 = PHI <_6, _6, _1, _1>; + _7 = PHI <16, _7>; + + 3 A set of PHI statements that only refer to each other or to one other + value. + + _8 = PHI <_9, _10>; + _9 = PHI <_8, _10>; + _10 = PHI <_8, _9, _1>; + + All of these statements produce copies and can be eliminated from the + program. For a copy statement we identify the value it creates a copy of + and replace references to the statement with the value -- we propagate the + copy. + + _3 = _2; // Replace all occurences of _3 by _2 + + _8 = PHI <_9, _10>; + _9 = PHI <_8, _10>; + _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1 + + To find all three types of copy statements we use an algorithm based on + strongly-connected components (SCCs) in dataflow graph. The algorithm was + introduced in an article from 2013[1]. We describe the algorithm bellow. + + To identify SCCs we implement the Robert Tarjan's SCC algorithm. For the + SCC computation we wrap potential copy statements in the 'vertex' struct. + To each of these statements we also assign a vertex number ('vxnum'). Since + the main algorithm has to be able to compute SCCs of subgraphs of the whole + dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from + leaving the subgraph. + + References: + + [1] Simple and Efficient Construction of Static Single Assignmemnt Form, + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791, + Section 3.2. */ + +/* Bitmap tracking statements which were propagated to be removed at the end of + the pass. */ + +static bitmap dead_stmts; + +/* State of vertex during SCC discovery. + + unvisited Vertex hasn't yet been popped from worklist. + vopen DFS has visited vertex for the first time. Vertex has been put + on Tarjan stack. + closed DFS has backtracked through vertex. At this point, vertex + doesn't have any unvisited neighbors. + in_scc Vertex has been popped from Tarjan stack. */ + +enum vstate +{ + unvisited, + vopen, + closed, + in_scc +}; + +/* Information about a vertex. Used by SCC discovery. */ + +struct vertex +{ + bool active; /* scc_discovery::compute_sccs () only considers a subgraph of + the whole dataflow graph. It uses this flag so that it knows + which vertices are part of this subgraph. */ + vstate state; + unsigned index; + unsigned lowlink; +}; + +/* SCC discovery. + + Used to find SCCs in a dataflow graph. Implements Tarjan's SCC + algorithm. */ + +class scc_discovery +{ +public: + scc_discovery (); + ~scc_discovery (); + auto_vec> compute_sccs (vec &stmts); + +private: + unsigned curr_generation = 0; + vertex* vertices; /* Indexed by SSA_NAME_VERSION. */ + auto_vec worklist; /* DFS stack. */ + auto_vec stack; /* Tarjan stack. */ + + void visit_neighbor (tree neigh_tree, unsigned parent_vxnum); +}; + +scc_discovery::scc_discovery () +{ + /* Create vertex struct for each SSA name. */ + vertices = XNEWVEC (struct vertex, num_ssa_names); + unsigned i = 0; + for (i = 0; i < num_ssa_names; i++) + vertices[i].active = false; +} + +scc_discovery::~scc_discovery () +{ + XDELETEVEC (vertices); +} + +/* Part of 'scc_discovery::compute_sccs ()'. */ + +void +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version) +{ + if (TREE_CODE (neigh_tree) != SSA_NAME) + return; /* Skip any neighbor that isn't an SSA name. */ + unsigned neigh_version = SSA_NAME_VERSION (neigh_tree); + + /* Skip neighbors outside the subgraph that Tarjan currently works + with. */ + if (!vertices[neigh_version].active) + return; + + vstate neigh_state = vertices[neigh_version].state; + vstate parent_state = vertices[parent_version].state; + if (parent_state == vopen) /* We're currently opening parent. */ + { + /* Put unvisited neighbors on worklist. Update lowlink of parent + vertex according to indices of neighbors present on stack. */ + switch (neigh_state) + { + case unvisited: + worklist.safe_push (neigh_version); + break; + case vopen: + case closed: + vertices[parent_version].lowlink + = std::min (vertices[parent_version].lowlink, + vertices[neigh_version].index); + break; + case in_scc: + /* Ignore these edges. */ + break; + } + } + else if (parent_state == closed) /* We're currently closing parent. */ + { + /* Update lowlink of parent vertex according to lowlinks of + children of parent (in terms of DFS tree). */ + if (neigh_state == closed) + { + vertices[parent_version].lowlink + = std::min (vertices[parent_version].lowlink, + vertices[neigh_version].lowlink); + } + } +} + +/* Compute SCCs in dataflow graph on given statements 'stmts'. Ignore + statements outside 'stmts'. Return the SCCs in a reverse topological + order. + + stmt_may_generate_copy () must be true for all statements from 'stmts'! */ + +auto_vec> +scc_discovery::compute_sccs (vec &stmts) +{ + auto_vec> sccs; + + for (gimple *stmt : stmts) + { + unsigned i; + switch (gimple_code (stmt)) + { + case GIMPLE_ASSIGN: + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt)); + break; + case GIMPLE_PHI: + i = SSA_NAME_VERSION (gimple_phi_result (stmt)); + break; + default: + gcc_unreachable (); + } + + vertices[i].index = 0; + vertices[i].lowlink = 0; + vertices[i].state = unvisited; + vertices[i].active = true; /* Mark the subgraph we'll be working on so + that we don't leave it. */ + + worklist.safe_push (i); + } + + /* Worklist loop. */ + unsigned curr_index = 0; + while (!worklist.is_empty ()) + { + unsigned i = worklist.pop (); + gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i)); + vstate state = vertices[i].state; + + if (state == unvisited) + { + vertices[i].state = vopen; + + /* Assign index to this vertex. */ + vertices[i].index = curr_index; + vertices[i].lowlink = curr_index; + curr_index++; + + /* Put vertex on stack and also on worklist to be closed later. */ + stack.safe_push (i); + worklist.safe_push (i); + } + else if (state == vopen) + vertices[i].state = closed; + + /* Visit neighbors of this vertex. */ + tree op; + gphi *phi; + switch (gimple_code (stmt)) + { + case GIMPLE_PHI: + phi = as_a (stmt); + unsigned j; + for (j = 0; j < gimple_phi_num_args (phi); j++) + { + op = gimple_phi_arg_def (phi, j); + visit_neighbor (op, i); + } + break; + case GIMPLE_ASSIGN: + op = gimple_assign_rhs1 (stmt); + visit_neighbor (op, i); + break; + default: + gcc_unreachable (); + } + + /* If we've just closed a root vertex of an scc, pop scc from stack. */ + if (state == vopen && vertices[i].lowlink == vertices[i].index) + { + vec scc = vNULL; + + unsigned j; + do + { + j = stack.pop (); + scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j))); + vertices[j].state = in_scc; + } + while (j != i); + + sccs.safe_push (scc); + } + } + + if (!stack.is_empty ()) + gcc_unreachable (); + + /* Clear 'active' flags. */ + for (gimple *stmt : stmts) + { + unsigned i; + switch (gimple_code (stmt)) + { + case GIMPLE_ASSIGN: + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt)); + break; + case GIMPLE_PHI: + i = SSA_NAME_VERSION (gimple_phi_result (stmt)); + break; + default: + gcc_unreachable (); + } + + vertices[i].active = false; + } + + return sccs; +} + +/* Could this statement potentially be a copy statement? + + This pass only considers statements for which this function returns 'true'. + Those are basically PHI functions and assignment statements similar to + + _2 = _1; + or + _2 = 5; */ + +static bool +stmt_may_generate_copy (gimple *stmt) +{ + /* A PHI may generate a copy. */ + if (gimple_code (stmt) == GIMPLE_PHI) + { + gphi *phi = as_a (stmt); + + /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs. */ + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi))) + return false; + + unsigned i; + for (i = 0; i < gimple_phi_num_args (phi); i++) + { + tree op = gimple_phi_arg_def (phi, i); + if (TREE_CODE (op) == SSA_NAME + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op)) + return false; + } + + /* If PHI has more than one unique non-SSA arguments, it won't generate a + copy. */ + tree const_op = NULL_TREE; + for (i = 0; i < gimple_phi_num_args (phi); i++) + { + tree op = gimple_phi_arg_def (phi, i); + if (TREE_CODE (op) != SSA_NAME) + { + if (const_op && !operand_equal_p (op, const_op)) + return false; + const_op = op; + } + } + + return true; + } + + /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy. */ + + if (!gimple_assign_single_p (stmt)) + return false; + + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (lhs) != SSA_NAME) + return false; + + /* lhs shouldn't flow through any abnormal edges. */ + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + return false; + + if (is_gimple_min_invariant (rhs)) + return true; /* A statement of type _2 = 5;. */ + + if (TREE_CODE (rhs) != SSA_NAME) + return false; + + /* rhs shouldn't flow through any abnormal edges. */ + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs)) + return false; + + /* It is possible that lhs has more alignment or value range information. By + propagating we would lose this information. So in the case that alignment + or value range information differs, we are conservative and do not + propagate. + + FIXME: Propagate alignment and value range info the same way copy-prop + does. */ + if (POINTER_TYPE_P (TREE_TYPE (lhs)) + && POINTER_TYPE_P (TREE_TYPE (rhs)) + && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs)) + return false; + if (!POINTER_TYPE_P (TREE_TYPE (lhs)) + && !POINTER_TYPE_P (TREE_TYPE (rhs)) + && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs)) + return false; + + return true; /* A statement of type _2 = _1;. */ +} + +/* Return all statements in cfun that could generate copies. All statements + for which stmt_may_generate_copy returns 'true'. */ + +static auto_vec +get_all_stmt_may_generate_copy (void) +{ + auto_vec result; + + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + { + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *s = gsi_stmt (gsi); + if (stmt_may_generate_copy (s)) + result.safe_push (s); + } + + gphi_iterator pi; + for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi)) + { + gimple *s = pi.phi (); + if (stmt_may_generate_copy (s)) + result.safe_push (s); + } + } + + return result; +} + +/* For each statement from given SCC, replace its usages by value + VAL. */ + +static void +replace_scc_by_value (vec scc, tree val) +{ + for (gimple *stmt : scc) + { + tree name = gimple_get_lhs (stmt); + replace_uses_by (name, val); + bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name)); + } + + if (dump_file) + fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ()); +} + +/* Part of 'sccopy_propagate ()'. */ + +static void +sccopy_visit_op (tree op, hash_set &outer_ops, + hash_set &scc_set, bool &is_inner, + tree &last_outer_op) +{ + bool op_in_scc = false; + + if (TREE_CODE (op) == SSA_NAME) + { + gimple *op_stmt = SSA_NAME_DEF_STMT (op); + if (scc_set.contains (op_stmt)) + op_in_scc = true; + } + + if (!op_in_scc) + { + outer_ops.add (op); + last_outer_op = op; + is_inner = false; + } +} + +/* Main function of this pass. Find and propagate all three types of copy + statements (see pass description above). + + This is an implementation of an algorithm from the paper Simple and + Efficient Construction of Static Single Assignmemnt Form[1]. It is based + on strongly-connected components (SCCs) in dataflow graph. The original + algorithm only considers PHI statements. We extend it to also consider + assignment statements of type _2 = _1;. + + The algorithm is based on this definition of a set of redundant PHIs[1]: + + A non-empty set P of PHI functions is redundant iff the PHI functions just + reference each other or one other value + + It uses this lemma[1]: + + Let P be a redundant set of PHI functions. Then there is a + strongly-connected component S subset of P that is also redundant. + + The algorithm works in this way: + + 1 Find SCCs + 2 For each SCC S in topological order: + 3 Construct set 'inner' of statements that only have other statements + from S on their right hand side + 4 Construct set 'outer' of values that originate outside S and appear on + right hand side of some statement from S + 5 If |outer| = 1, outer only contains a value v. Statements in S only + refer to each other or to v -- they are redundant. Propagate v. + Else, recurse on statements in inner. + + The implementation is non-recursive. + + References: + + [1] Simple and Efficient Construction of Static Single Assignmemnt Form, + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791, + Section 3.2. */ + +static void +sccopy_propagate () +{ + auto_vec useful_stmts = get_all_stmt_may_generate_copy (); + scc_discovery discovery; + + auto_vec> worklist = discovery.compute_sccs (useful_stmts); + + while (!worklist.is_empty ()) + { + vec scc = worklist.pop (); + + auto_vec inner; + hash_set outer_ops; + tree last_outer_op = NULL_TREE; + + /* Prepare hash set of PHIs in scc to query later. */ + hash_set scc_set; + for (gimple *stmt : scc) + scc_set.add (stmt); + + for (gimple *stmt : scc) + { + bool is_inner = true; + + gphi *phi; + tree op; + + switch (gimple_code (stmt)) + { + case GIMPLE_PHI: + phi = as_a (stmt); + unsigned j; + for (j = 0; j < gimple_phi_num_args (phi); j++) + { + op = gimple_phi_arg_def (phi, j); + sccopy_visit_op (op, outer_ops, scc_set, is_inner, + last_outer_op); + } + break; + case GIMPLE_ASSIGN: + op = gimple_assign_rhs1 (stmt); + sccopy_visit_op (op, outer_ops, scc_set, is_inner, + last_outer_op); + break; + default: + gcc_unreachable (); + } + + if (is_inner) + inner.safe_push (stmt); + } + + if (outer_ops.elements () == 1) + { + /* The only operand in outer_ops. */ + tree outer_op = last_outer_op; + replace_scc_by_value (scc, outer_op); + } + else if (outer_ops.elements () > 1) + { + /* Add inner sccs to worklist. */ + auto_vec> inner_sccs + = discovery.compute_sccs (inner); + for (vec inner_scc : inner_sccs) + worklist.safe_push (inner_scc); + } + else + gcc_unreachable (); + + scc.release (); + } +} + +/* Called when pass execution starts. */ + +static void +init_sccopy (void) +{ + /* For propagated statements. */ + dead_stmts = BITMAP_ALLOC (NULL); +} + +/* Called before pass execution ends. */ + +static void +finalize_sccopy (void) +{ + /* Remove all propagated statements. */ + simple_dce_from_worklist (dead_stmts); + BITMAP_FREE (dead_stmts); + + /* Propagating a constant may create dead eh edges. */ + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + gimple_purge_dead_eh_edges (bb); +} + +namespace { + +const pass_data pass_data_sccopy = +{ + GIMPLE_PASS, /* type */ + "sccopy", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + ( PROP_cfg | PROP_ssa ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */ +}; + +class pass_sccopy : public gimple_opt_pass +{ +public: + pass_sccopy (gcc::context *ctxt) + : gimple_opt_pass (pass_data_sccopy, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return true; } + virtual unsigned int execute (function *); + opt_pass * clone () final override { return new pass_sccopy (m_ctxt); } +}; // class pass_sccopy + +unsigned +pass_sccopy::execute (function *) +{ + init_sccopy (); + sccopy_propagate (); + finalize_sccopy (); + return 0; +} + +} // anon namespace + +gimple_opt_pass * +make_pass_sccopy (gcc::context *ctxt) +{ + return new pass_sccopy (ctxt); +} diff --git a/gcc/passes.def b/gcc/passes.def index 1e1950bdb39..048ad3ee7a5 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_if_to_switch); NEXT_PASS (pass_convert_switch); NEXT_PASS (pass_cleanup_eh); + NEXT_PASS (pass_sccopy); NEXT_PASS (pass_profile); NEXT_PASS (pass_local_pure_const); NEXT_PASS (pass_modref); @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3. If not see However, this also causes us to misdiagnose cases that should be real warnings (e.g., testsuite/gcc.dg/pr18501.c). */ NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); + NEXT_PASS (pass_sccopy); NEXT_PASS (pass_tail_calls); /* Split critical edges before late uninit warning to reduce the number of false positives from it. */ diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c new file mode 100644 index 00000000000..1e61a6b320e --- /dev/null +++ b/gcc/testsuite/gcc.dg/sccopy-1.c @@ -0,0 +1,78 @@ +/* { dg-do compile } */ +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */ +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */ + +int __GIMPLE (ssa, startwith ("sccopy")) +main () +{ + int a; + int y; + int x; + int _1; + int _2; + int _13; + + __BB(2): + if (x_7(D) == 5) + goto __BB3; + else + goto __BB4; + + __BB(3): + a_10 = x_7(D); + goto __BB5; + + __BB(4): + a_9 = y_8(D); + goto __BB5; + + __BB(5): + a_3 = __PHI (__BB3: a_10, __BB4: a_9); + if (x_7(D) == y_8(D)) + goto __BB6; + else + goto __BB11; + + __BB(6): + a_11 = a_3 + 1; + goto __BB7; + + __BB(7): + a_4 = __PHI (__BB6: a_11, __BB11: a_6); +label1: + if (x_7(D) != y_8(D)) + goto __BB8; + else + goto __BB10; + + __BB(8): + goto __BB9; + + __BB(9): + a_12 = __PHI (__BB8: a_4, __BB10: a_5); + goto __BB10; + + __BB(10,loop_header(1)): + a_5 = __PHI (__BB7: a_4, __BB9: a_12); +label2: + _1 = y_8(D) * 2; + if (x_7(D) == _1) + goto __BB9; + else + goto __BB11; + + __BB(11): + a_6 = __PHI (__BB5: a_3, __BB10: a_5); + _2 = x_7(D) * 3; + if (y_8(D) == _2) + goto __BB7; + else + goto __BB12; + + __BB(12): + _13 = 0; + return _13; + +} + + diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 09e6ada5b2f..94a48461590 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);