From patchwork Mon Jul 15 02:19:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1131829 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-505073-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="KXWI2fjl"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.b="BzimiKJ8"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45n6gq4MrLz9sP0 for ; Mon, 15 Jul 2019 12:20:21 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=J/w2hbhz3EwGXl7zO+7Iyr+IKsx2Dtd6oMfQ+65G8IbEhx36F+ISy nMo57b8zOee9HzO/aEr6q61289/EQOtpRkJRn7zf5XBl5PsxbqHiivMxyUxJMXeL 3t+QPZWzh9uxVYJsQj14DPi5oHCMx1dXdDWCPAUPqNuMpA6ERPG4e8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :mime-version:content-type:content-transfer-encoding; s=default; bh=7vBsCdsLRm/AHURNjAZrkUDeoWc=; b=KXWI2fjlpGlkxcSb06jTzHkxTLwE nZ2PzhR9ysoj/rDr/ep6pRFgVisuK7juK4q/eDv2wdPKC7QNTHwzCgiIB9sV9f+1 o+L5/paFc7y9T6FvSTsqJYoWekBH8wRBcJO+w4zmC/06sdtQ9iAMGlydlAIIXvBK r3zp49aI83cspag= Received: (qmail 101614 invoked by alias); 15 Jul 2019 02:20:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 101606 invoked by uid 89); 15 Jul 2019 02:20:10 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=area, gate, reached, straight X-HELO: NAM01-SN1-obe.outbound.protection.outlook.com Received: from mail-eopbgr820097.outbound.protection.outlook.com (HELO NAM01-SN1-obe.outbound.protection.outlook.com) (40.107.82.97) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 15 Jul 2019 02:20:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hhjmxNfr/c9JVoH9/2Gida11JoYseoMnaK8nCwfY21XxHGV4Iw08d7Hwr3OC2c9KFTD3Mik585ZpX8kAzLTsvBe8o8zPz9Lcw+8FXYFs8rrW35IoH6PVt/FLJkJ2zpjr5Qyjyk888BwPvVMJVpISMkH3US3bzI9cW5TNcjxS90YdC9nlPgS1o3Tcs6iHIncmLooLBdyks3uElUZwjVe0HBogBKwEe4F16s1EsobXW9mYZ/xnLzMqH79O67RAAp0nH8x80pbMbsySy6kST7oryiocXl+KfEGCdf7uWLVi+58ViR+UD8QVrTBiDMoOPdSZwkdGqCGgfXMH2qEk5S7umw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mbog6W4A8U+VM6Bqn2zHv2hrNYAengGY6XqSxFV2lfY=; b=lQsVEIg1KRSdogufMddWP2FTUXCexHtq9xvfvvjjdx8hV1lx4F0EwkXRGjOwqnigvD+T6DgeMIdJ+eVSAoCYO3yacE9QaRL3gZ0jobTv8VsRGzG2h+DekK9W0XP1rO8bXxQW/+7M2SY446rVO2/20UGeMt9KCaExp/fdNCiwHY/4pAj7STprMOcQ3bSUWDNy9PtfovtQasVCiIR3RU1ZmxuWkVFw7/orLcEaiEI/zeW3fFgWEMHKXxK4rsvCDN2QFqIW61Lf9R2wfdSNOBsmw/fncF0qRCgZIv3Jm/HTNzi0CD8goVdmqSiZ/yAykWHu5F8jo4GbTc+Fwv+b3l+zeg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mbog6W4A8U+VM6Bqn2zHv2hrNYAengGY6XqSxFV2lfY=; b=BzimiKJ843dQmidwlYQOYHZTIb5VDrKV1il2b2hid7WE97ysVq7AszUzYNdcaUFVf9jUkVkH2miNZ9JX3RGVkwUGrEH/987WGwdf38dJXY4DSx64oAgCPqrZtM6k59TDfyFC7OMZTv5zJZp0g8Wxi3PUVsIdY372gjOr6dsbjJc= Received: from BYAPR01MB4869.prod.exchangelabs.com (20.177.228.18) by BYAPR01MB3829.prod.exchangelabs.com (52.135.193.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.14; Mon, 15 Jul 2019 02:19:57 +0000 Received: from BYAPR01MB4869.prod.exchangelabs.com ([fe80::6939:f241:e1a0:7e9b]) by BYAPR01MB4869.prod.exchangelabs.com ([fe80::6939:f241:e1a0:7e9b%7]) with mapi id 15.20.2073.012; Mon, 15 Jul 2019 02:19:57 +0000 From: Feng Xue OS To: Richard Biener , Michael Matz CC: "gcc-patches@gcc.gnu.org" Subject: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134) Date: Mon, 15 Jul 2019 02:19:56 +0000 Message-ID: References: , , In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=fxue@os.amperecomputing.com; x-ms-oob-tlc-oobclassifiers: OLM:10000; received-spf: None (protection.outlook.com: os.amperecomputing.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 MIME-Version: 1.0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fxue@os.amperecomputing.com Some time passed, so ping again. I made this patch, because it can reward us with 7% performance benefit in some real application. For convenience, the optimization to be implemented was listed in the following again. And hope your comments on the patch, or design suggestions. Thanks! Suppose a loop as: void f (std::map m) { for (auto it = m.begin (); it != m.end (); ++it) { /* if (b) is semi-invariant. */ if (b) { b = do_something(); /* Has effect on b */ } else { /* No effect on b */ } statements; /* Also no effect on b */ } } A transformation, kind of loop split, could be: void f (std::map m) { for (auto it = m.begin (); it != m.end (); ++it) { if (b) { b = do_something(); } else { ++it; statements; break; } statements; } for (; it != m.end (); ++it) { statements; } } If "statements" contains nothing, the second loop becomes an empty one, which can be removed. And if "statements" are straight line instructions, we get an opportunity to vectorize the second loop. Feng diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9a46f93d89d..2334b184945 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,23 @@ +2019-06-18 Feng Xue + + PR tree-optimization/89134 + * doc/invoke.texi (max-cond-loop-split-insns): Document new --params. + (min-cond-loop-split-prob): Likewise. + * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob. + * passes.def (pass_cond_loop_split) : New pass. + * timevar.def (TV_COND_LOOP_SPLIT): New time variable. + * tree-pass.h (make_pass_cond_loop_split): New declaration. + * tree-ssa-loop-split.c (split_info): New class. + (find_vdef_in_loop, vuse_semi_invariant_p): New functions. + (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise. + (branch_removable_p, get_cond_invariant_branch): Likewise. + (is_cond_in_hidden_loop, compute_added_num_insns): Likewise. + (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise. + (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise. + (pass_data_cond_loop_split): New variable. + (pass_cond_loop_split): New class. + (make_pass_cond_loop_split): New function. + 2019-06-18 Kewen Lin PR middle-end/80791 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index eaef4cd63d2..0427fede3d6 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a single loop. @item lim-expensive The minimum cost of an expensive expression in the loop invariant motion. +@item max-cond-loop-split-insns +The maximum number of insns to be increased due to loop split on +semi-invariant condition statement. + +@item min-cond-loop-split-prob +The minimum threshold for probability of semi-invaraint condition +statement to trigger loop split. + @item iv-consider-all-candidates-bound Bound on number of candidates for induction variables, below which all candidates are considered for each use in induction variable diff --git a/gcc/params.def b/gcc/params.def index 0db60951413..5384f7d1c4d 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL, "The maximum number of unswitchings in a single loop.", 3, 0, 0) +/* The maximum number of increased insns due to loop split on semi-invariant + condition statement. */ +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS, + "max-cond-loop-split-insns", + "The maximum number of insns to be increased due to loop split on " + "semi-invariant condition statement.", + 100, 0, 0) + +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, + "min-cond-loop-split-prob", + "The minimum threshold for probability of semi-invaraint condition " + "statement to trigger loop split.", + 30, 0, 100) + /* The maximum number of insns in loop header duplicated by the copy loop headers pass. */ DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS, diff --git a/gcc/passes.def b/gcc/passes.def index ad2efabd385..bb32b88738e 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -261,6 +261,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_loop_split); + NEXT_PASS (pass_cond_loop_split); NEXT_PASS (pass_loop_versioning); NEXT_PASS (pass_loop_jam); /* All unswitching, final value replacement and splitting can expose diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 27a522e0140..9aa069e5c29 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2019-06-18 Feng Xue + + * gcc.dg/tree-ssa/loop-cond-split-1.c: New test. + * g++.dg/tree-ssa/loop-cond-split-1.C: New test. + 2019-06-17 Jakub Jelinek * gcc.dg/vect/vect-simd-8.c: New test. diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C new file mode 100644 index 00000000000..df269c5ee44 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */ + +#include +#include + +using namespace std; + +class A +{ +public: + bool empty; + void set (string s); +}; + +class B +{ + map m; + void f (); +}; + +extern A *ga; + +void B::f () +{ + for (map::iterator iter = m.begin (); iter != m.end (); ++iter) + { + if (ga->empty) + ga->set (iter->second); + } +} + +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c new file mode 100644 index 00000000000..a0eb7a26ad5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-cond_lsplit-details" } */ + +__attribute__((pure)) __attribute__((noinline)) int inc (int i) +{ + return i + 1; +} + +extern int do_something (void); +extern int b; + +void test(int n) +{ + int i; + + for (i = 0; i < n; i = inc (i)) + { + if (b) + b = do_something(); + } +} + +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "cond_lsplit" } } */ diff --git a/gcc/timevar.def b/gcc/timevar.def index 13cb470b688..5a2a80a29f7 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -188,6 +188,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv") DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop") DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching") DEFTIMEVAR (TV_LOOP_SPLIT , "loop splitting") +DEFTIMEVAR (TV_COND_LOOP_SPLIT , "loop splitting for conditions") DEFTIMEVAR (TV_LOOP_JAM , "unroll and jam") DEFTIMEVAR (TV_COMPLETE_UNROLL , "complete unrolling") DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops") diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 3a0b3805d24..cdb7ef3c9f2 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -367,6 +367,7 @@ extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_cond_loop_split (gcc::context *ctxt); extern gimple_opt_pass *make_pass_loop_jam (gcc::context *ctxt); extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index 999c9a30366..7239d0cfb00 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-loop.h" #include "tree-ssa-loop-manip.h" #include "tree-into-ssa.h" +#include "tree-inline.h" #include "cfgloop.h" +#include "params.h" #include "tree-scalar-evolution.h" #include "gimple-iterator.h" #include "gimple-pretty-print.h" @@ -40,7 +42,9 @@ along with GCC; see the file COPYING3. If not see #include "gimple-fold.h" #include "gimplify-me.h" -/* This file implements loop splitting, i.e. transformation of loops like +/* This file implements two kind of loop splitting. + + One transformation of loops like: for (i = 0; i < 100; i++) { @@ -670,6 +674,782 @@ tree_ssa_split_loops (void) return 0; } + +/* Another transformation of loops like: + + for (i = INIT (); CHECK (i); i = NEXT ()) + { + if (expr (a_1, a_2, ..., a_n)) + a_j = ...; // change at least one a_j + else + S; // not change any a_j + } + + into: + + for (i = INIT (); CHECK (i); i = NEXT ()) + { + if (expr (a_1, a_2, ..., a_n)) + a_j = ...; + else + { + S; + i = NEXT (); + break; + } + } + + for (; CHECK (i); i = NEXT ()) + { + S; + } + + */ + +/* Data structure to hold temporary information during loop split upon + semi-invariant conditional statement. */ +class split_info { +public: + /* Array of all basic blocks in a loop, returned by get_loop_body(). */ + basic_block *bbs; + + /* All memory store/clobber statements in a loop. */ + auto_vec stores; + + /* Whether above memory stores vector has been filled. */ + bool set_stores; + + /* Semi-invariant conditional statement, upon which to split loop. */ + gcond *cond; + + split_info () : bbs (NULL), set_stores (false), cond (NULL) { } + + ~split_info () + { + if (bbs) + free (bbs); + } +}; + +/* Find all statements with memory-write effect in LOOP, including memory + store and non-pure function call, and keep those in a vector. This work + is only done one time, for the vector should be constant during analysis + stage of semi-invariant condition. */ + +static void +find_vdef_in_loop (struct loop *loop) +{ + split_info *info = (split_info *) loop->aux; + gphi *vphi = get_virtual_phi (loop->header); + + /* Indicate memory store vector has been filled. */ + info->set_stores = true; + + /* If loop contains memory operation, there must be a virtual PHI node in + loop header basic block. */ + if (vphi == NULL) + return; + + /* All virtual SSA names inside the loop are connected to be a cyclic + graph via virtual PHI nodes. The virtual PHI node in loop header just + links the first and the last virtual SSA names, by using the last as + PHI operand to define the first. */ + const edge latch = loop_latch_edge (loop); + const tree first = gimple_phi_result (vphi); + const tree last = PHI_ARG_DEF_FROM_EDGE (vphi, latch); + + /* The virtual SSA cyclic graph might consist of only one SSA name, who + is defined by itself. + + .MEM_1 = PHI <.MEM_2(loop entry edge), .MEM_1(latch edge)> + + This means the loop contains only memory loads, so we can skip it. */ + if (first == last) + return; + + auto_vec others; + auto_vec worklist; + auto_bitmap visited; + + bitmap_set_bit (visited, SSA_NAME_VERSION (first)); + bitmap_set_bit (visited, SSA_NAME_VERSION (last)); + worklist.safe_push (last); + + do + { + tree vuse = worklist.pop (); + gimple *stmt = SSA_NAME_DEF_STMT (vuse); + + /* We mark the first and last SSA names as visited at the beginning, + and reversely start the process from the last SSA name towards the + first, which ensures that this do-while will not touch SSA names + defined outside of the loop. */ + gcc_assert (gimple_bb (stmt) + && flow_bb_inside_loop_p (loop, gimple_bb (stmt))); + + if (gimple_code (stmt) == GIMPLE_PHI) + { + gphi *phi = as_a (stmt); + + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + tree arg = gimple_phi_arg_def (stmt, i); + + if (bitmap_set_bit (visited, SSA_NAME_VERSION (arg))) + worklist.safe_push (arg); + } + } + else + { + tree prev = gimple_vuse (stmt); + + /* Non-pure call statement is conservatively assumed to impact all + memory locations. So place call statements ahead of other memory + stores in the vector with an idea of of using them as shortcut + terminators to memory alias analysis. */ + if (gimple_code (stmt) == GIMPLE_CALL) + info->stores.safe_push (stmt); + else + others.safe_push (stmt); + + if (bitmap_set_bit (visited, SSA_NAME_VERSION (prev))) + worklist.safe_push (prev); + } + } while (!worklist.is_empty ()); + + info->stores.safe_splice (others); +} + + +/* Given STMT, memory load or pure call statement, check whether it is impacted + by some memory store in LOOP, excluding trace starting from SKIP_HEAD (the + trace is composed of SKIP_HEAD and those basic block dominated by it, always + corresponds to one branch of a conditional statement). If SKIP_HEAD is + NULL, all basic blocks of LOOP are checked. */ + +static bool +vuse_semi_invariant_p (struct loop *loop, gimple *stmt, + const_basic_block skip_head) +{ + split_info *info = (split_info *) loop->aux; + + /* Collect memory store/clobber statements if have not do that. */ + if (!info->set_stores) + find_vdef_in_loop (loop); + + tree rhs = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE; + ao_ref ref; + gimple *store; + unsigned i; + + ao_ref_init (&ref, rhs); + + FOR_EACH_VEC_ELT (info->stores, i, store) + { + /* Skip basic blocks dominated by SKIP_HEAD, if non-NULL. */ + if (skip_head + && dominated_by_p (CDI_DOMINATORS, gimple_bb (store), skip_head)) + continue; + + if (!ref.ref || stmt_may_clobber_ref_p_1 (store, &ref)) + return false; + } + + return true; +} + +/* Forward declaration. */ + +static bool +stmt_semi_invariant_p (struct loop *loop, gimple *stmt, + const_basic_block skip_head); + +/* Suppose one condition branch, led by SKIP_HEAD, is not executed since + certain iteration of LOOP, check whether an SSA name (NAME) remains + unchanged in next interation. We call this characterisic as semi- + invariantness. SKIP_HEAD might be NULL, if so, nothing excluded, all + basic blocks and control flows in the loop will be considered. If non- + NULL, SSA name to check is supposed to be defined before SKIP_HEAD. */ + +static bool +ssa_semi_invariant_p (struct loop *loop, const tree name, + const_basic_block skip_head) +{ + gimple *def = SSA_NAME_DEF_STMT (name); + const_basic_block def_bb = gimple_bb (def); + + /* An SSA name defined outside a loop is definitely semi-invariant. */ + if (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)) + return true; + + if (gimple_code (def) == GIMPLE_PHI) + { + /* For PHI node that is not in loop header, its source operands should + be defined inside the loop, which are seen as loop variant. */ + if (def_bb != loop->header || !skip_head) + return false; + + const_edge latch = loop_latch_edge (loop); + tree from = PHI_ARG_DEF_FROM_EDGE (as_a (def), latch); + + /* A PHI node in loop header contains two source operands, one is + initial value, the other is the copy of last iteration through loop + latch, we call it latch value. From the PHI node to definition + of latch value, if excluding branch trace from SKIP_HEAD, there + is no definition of other version of same variable, SSA name defined + by the PHI node is semi-invariant. + + loop entry + | .--- latch ---. + | | | + v v | + x_1 = PHI | + | | + v | + .------- if (cond) -------. | + | | | + | [ SKIP ] | + | | | + | x_2 = ... | + | | | + '---- T ---->.<---- F ----' | + | | + v | + x_3 = PHI | + | | + '----------------------' + + Suppose in certain iteration, execution flow in above graph goes + through true branch, which means that one source value to define + x_3 in false branch (x2) is skipped, x_3 only comes from x_1, and + x_1 in next iterations is defined by x_3, we know that x_1 will + never changed if COND always chooses true branch from then on. */ + + while (from != name) + { + /* A new value comes from a CONSTANT. */ + if (TREE_CODE (from) != SSA_NAME) + return false; + + gimple *stmt = SSA_NAME_DEF_STMT (from); + const_basic_block bb = gimple_bb (stmt); + + /* A new value comes from outside of loop. */ + if (!bb || !flow_bb_inside_loop_p (loop, bb)) + return false; + + from = NULL_TREE; + + if (gimple_code (stmt) == GIMPLE_PHI) + { + gphi *phi = as_a (stmt); + + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + const_edge e = gimple_phi_arg_edge (phi, i); + + /* Not consider redefinitions in excluded basic blocks. */ + if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head)) + { + /* There are more than one source operands that can + provide value to the SSA name, it is variant. */ + if (from) + return false; + + from = gimple_phi_arg_def (phi, i); + } + } + } + else if (gimple_code (stmt) == GIMPLE_ASSIGN) + { + /* For simple value copy, check its rhs instead. */ + if (gimple_assign_ssa_name_copy_p (stmt)) + from = gimple_assign_rhs1 (stmt); + } + + /* Any other kind of definition is deemed to introduce a new value + to the SSA name. */ + if (!from) + return false; + } + return true; + } + + /* Value originated from volatile memory load or return of normal (non- + const/pure) call should not be treated as constant in each iteration. */ + if (gimple_has_side_effects (def)) + return false; + + /* Check if any memory store may kill memory load at this place. */ + if (gimple_vuse (def) && !vuse_semi_invariant_p (loop, def, skip_head)) + return false; + + /* Check operands of definition statement of the SSA name. */ + return stmt_semi_invariant_p (loop, def, skip_head); +} + +/* Check whether STMT is semi-invariant in LOOP, iff all its operands are + semi-invariant. Trace composed of basic block SKIP_HEAD and basic blocks + dominated by it are excluded from the loop. */ + +static bool +stmt_semi_invariant_p (struct loop *loop, gimple *stmt, + const_basic_block skip_head) +{ + ssa_op_iter iter; + tree use; + + /* Although operand of a statement might be SSA name, CONSTANT or VARDECL, + here we only need to check SSA name operands. This is because check on + VARDECL operands, which involve memory loads, must have been done + prior to invocation of this function in vuse_semi_invariant_p. */ + FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE) + { + if (!ssa_semi_invariant_p (loop, use, skip_head)) + return false; + } + + return true; +} + +/* Determine when conditional statement never transfers execution to one of its + branch, whether we can remove the branch's leading basic block (BRANCH_BB) + and those basic blocks dominated by BRANCH_BB. */ + +static bool +branch_removable_p (basic_block branch_bb) +{ + if (single_pred_p (branch_bb)) + return true; + + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, branch_bb->preds) + { + if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb)) + continue; + + if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src)) + continue; + + /* The branch can be reached from opposite branch, or from some + statement not dominated by the conditional statement. */ + return false; + } + + return true; +} + +/* Find out which branch of a conditional statement (COND) is invariant in the + execution context of LOOP. That is: once the branch is selected in certain + iteration of the loop, any operand that contributes to computation of the + conditional statement remains unchanged in all following iterations. */ + +static int +get_cond_invariant_branch (struct loop *loop, gcond *cond) +{ + basic_block cond_bb = gimple_bb (cond); + basic_block targ_bb[2]; + bool invar[2]; + unsigned invar_checks; + + for (unsigned i = 0; i < 2; i++) + { + targ_bb[i] = EDGE_SUCC (cond_bb, i)->dest; + + /* One branch directs to loop exit, no need to perform loop split upon + this conditional statement. Firstly, it is trivial if the exit branch + is semi-invariant, for the statement is just to break loop. Secondly, + if the opposite branch is semi-invariant, it means that the statement + is real loop-invariant, which is covered by loop unswitch. */ + if (!flow_bb_inside_loop_p (loop, targ_bb[i])) + return -1; + } + + invar_checks = 0; + + for (unsigned i = 0; i < 2; i++) + { + invar[!i] = false; + + if (!branch_removable_p (targ_bb[i])) + continue; + + /* Given a semi-invariant branch, if its opposite branch dominates + loop latch, it and its following trace will only be executed in + final iteration of loop, namely it is not part of repeated body + of the loop. Similar to the above case that the branch is loop + exit, no need to split loop. */ + if (dominated_by_p (CDI_DOMINATORS, loop->latch, targ_bb[i])) + continue; + + invar[!i] = stmt_semi_invariant_p (loop, cond, targ_bb[i]); + invar_checks++; + } + + /* With both branches being invariant (handled by loop unswitch) or + variant is not what we want. */ + if (invar[0] ^ !invar[1]) + return -1; + + /* Found a real loop-invariant condition, do nothing. */ + if (invar_checks < 2 && stmt_semi_invariant_p (loop, cond, NULL)) + return -1; + + return invar[1]; +} + +/* Given a conditional statement in LOOP, whose basic block is COND_BB, + suppose its execution only goes through one of its branch, whose index is + specified by BRANCH. Return TRUE if this statement still executes multiple + times in one iteration of LOOP, in that the statement belongs a nested + unrecognized loop. */ + +static bool +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb, + int branch) +{ + basic_block branch_bb = EDGE_SUCC (cond_bb, branch)->dest; + + if (cond_bb == loop->header || branch_bb == loop->latch) + return false; + + basic_block *bbs = ((split_info *) loop->aux)->bbs; + auto_vec worklist; + + for (unsigned i = 0; i < loop->num_nodes; i++) + bbs[i]->flags &= ~BB_REACHABLE; + + /* Mark latch basic block as visited so as to terminate reachablility + traversal. */ + loop->latch->flags |= BB_REACHABLE; + + gcc_assert (flow_bb_inside_loop_p (loop, branch_bb)); + + /* Start from the specified branch, the opposite branch is ignored for it + will not be executed. */ + branch_bb->flags |= BB_REACHABLE; + worklist.safe_push (branch_bb); + + do + { + basic_block bb = worklist.pop (); + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb->succs) + { + basic_block succ_bb = e->dest; + + if (succ_bb == cond_bb) + return true; + + if (!flow_bb_inside_loop_p (loop, succ_bb)) + continue; + + if (succ_bb->flags & BB_REACHABLE) + continue; + + succ_bb->flags |= BB_REACHABLE; + worklist.safe_push (succ_bb); + } + } while (!worklist.is_empty ()); + + return false; +} + + +/* Calculate increased code size measured by estimated insn number if applying + loop split upon certain branch (BRANCH) of a conditional statement whose + basic block is COND_BB. */ + +static int +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb, + int branch) +{ + const_basic_block targ_bb_var = EDGE_SUCC (cond_bb, !branch)->dest; + basic_block *bbs = ((split_info *) loop->aux)->bbs; + int num = 0; + + for (unsigned i = 0; i < loop->num_nodes; i++) + { + /* Do no count basic blocks only in opposite branch. */ + if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var)) + continue; + + for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi); + gsi_next (&gsi)) + num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights); + } + + return num; +} + +/* Return true if it is eligible and profitable to perform loop split upon + a conditional statement COND in LOOP. */ + +static bool +can_split_loop_on_cond (struct loop *loop, gcond *cond) +{ + int branch = get_cond_invariant_branch (loop, cond); + + if (branch < 0) + return false; + + basic_block cond_bb = gimple_bb (cond); + profile_probability prob = EDGE_SUCC (cond_bb, branch)->probability; + + /* When accurate profile information is available, and execution + frequency of the branch is too low, just let it go. */ + if (prob.reliable_p ()) + { + int thres = PARAM_VALUE (PARAM_MIN_COND_LOOP_SPLIT_PROB); + + if (prob < profile_probability::always ().apply_scale (thres, 100)) + return false; + } + + /* Add a threshold for increased code size to disable loop split. */ + if (compute_added_num_insns (loop, cond_bb, branch) > + PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS)) + return false; + + /* Skip conditional statement that is inside a nested unrecognized loop. */ + if (is_cond_in_hidden_loop (loop, cond_bb, branch)) + return false; + + /* Temporarily keep branch index in conditional statement. */ + gimple_set_plf (cond, GF_PLF_1, branch); + return true; +} + +/* Traverse all conditional statements in LOOP, to find out a good candidate + upon which we can do loop split. */ + +static bool +mark_cond_to_split_loop (struct loop *loop) +{ + split_info *info = new split_info (); + basic_block *bbs = info->bbs = get_loop_body (loop); + + /* Allocate an area to keep temporary info, and associate its address + with loop aux field. */ + loop->aux = info; + + for (unsigned i = 0; i < loop->num_nodes; i++) + { + basic_block bb = bbs[i]; + + /* We only consider conditional statement, which be executed at most once + in each iteration of the loop. So skip statements in inner loops. */ + if (bb->loop_father != loop) + continue; + + /* Actually this check is not a must constraint. With it, we can ensure + conditional statement will always be executed in each iteration. */ + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) + continue; + + gimple *last = last_stmt (bb); + + if (!last || gimple_code (last) != GIMPLE_COND) + continue; + + gcond *cond = as_a (last); + + if (can_split_loop_on_cond (loop, cond)) + { + info->cond = cond; + return true; + } + } + + delete info; + loop->aux = NULL; + + return false; +} + +/* Given a loop (LOOP1) with a chosen conditional statement candidate, perform + loop split transformation illustrated as the following graph. + + .-------T------ if (true) ------F------. + | .---------------. | + | | | | + v | v v + pre-header | pre-header + | .------------. | | .------------. + | | | | | | | + | v | | | v | + header | | header | + | | | | | + [ bool r = cond; ] | | | | + | | | | | + .---- if (r) -----. | | .--- if (true) ---. | + | | | | | | | + invariant | | | invariant | | + | | | | | | | + '---T--->.<---F---' | | '---T--->.<---F---' | + | | / | | + stmts | / stmts | + | | / | | + / \ | / / \ | + .-------* * [ if (!r) ] .-------* * | + | | | | | | + | latch | | latch | + | | | | | | + | '------------' | '------------' + '------------------------. .-----------' + loop1 | | loop2 + v v + exits + + In the graph, loop1 represents the part derived from original one, and + loop2 is duplicated using loop_version (), which corresponds to the part + of original one being splitted out. In loop1, a new bool temporary (r) + is introduced to keep value of the condition result. In original latch + edge of loop1, we insert a new conditional statement whose value comes + from previous temporary (r), one of its branch goes back to loop1 header + as a latch edge, and the other branch goes to loop2 pre-header as an entry + edge. And also in loop2, we abandon the variant branch of the conditional + statement candidate by setting a constant bool condition, based on which + branch is semi-invariant. */ + +static bool +split_loop_for_cond (struct loop *loop1) +{ + split_info *info = (split_info *) loop1->aux; + gcond *cond = info->cond; + basic_block cond_bb = gimple_bb (cond); + int branch = gimple_plf (cond, GF_PLF_1); + bool true_invar = !!(EDGE_SUCC (cond_bb, branch)->flags & EDGE_TRUE_VALUE); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n", + current_function_name (), loop1->num, + true_invar ? "T" : "F", cond_bb->index); + print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS); + } + + initialize_original_copy_tables (); + + struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, + profile_probability::always (), + profile_probability::never (), + profile_probability::always (), + profile_probability::always (), + true); + if (!loop2) + { + free_original_copy_tables (); + return false; + } + + /* Generate a bool type temporary to hold result of the condition. */ + tree tmp = make_ssa_name (boolean_type_node); + gimple_stmt_iterator gsi = gsi_last_bb (cond_bb); + gimple *stmt = gimple_build_assign (tmp, + gimple_cond_code (cond), + gimple_cond_lhs (cond), + gimple_cond_rhs (cond)); + + gsi_insert_before (&gsi, stmt, GSI_NEW_STMT); + gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node); + update_stmt (cond); + + basic_block cond_bb_copy = get_bb_copy (cond_bb); + gcond *cond_copy = as_a (last_stmt (cond_bb_copy)); + + /* Replace the condition in loop2 with a bool constant to let PassManager + remove the variant branch after current pass completes. */ + if (true_invar) + gimple_cond_make_true (cond_copy); + else + gimple_cond_make_false (cond_copy); + + update_stmt (cond_copy); + + /* Insert a new conditional statement on latch edge of loop1. This + statement acts as a switch to transfer execution from loop1 to loop2, + when loop1 enters into invariant state. */ + basic_block latch_bb = split_edge (loop_latch_edge (loop1)); + basic_block break_bb = split_edge (single_pred_edge (latch_bb)); + gimple *break_cond = gimple_build_cond (EQ_EXPR, tmp, boolean_true_node, + NULL_TREE, NULL_TREE); + + gsi = gsi_last_bb (break_bb); + gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT); + + edge to_loop1 = single_succ_edge (break_bb); + edge to_loop2 = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0); + + to_loop1->flags &= ~EDGE_FALLTHRU; + to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE; + to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE; + + update_ssa (TODO_update_ssa); + + /* Due to introduction of a control flow edge from loop1 latch to loop2 + pre-header, we should update PHIs in loop2 to reflect this connection + between loop1 and loop2. */ + connect_loop_phis (loop1, loop2, to_loop2); + + free_original_copy_tables (); + + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1); + + return true; +} + +/* Main entry point to perform loop splitting for suitable if-conditions + in all loops. */ + +static unsigned int +tree_ssa_split_loops_for_cond (void) +{ + struct loop *loop; + auto_vec loop_list; + bool changed = false; + unsigned i; + + FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT) + loop->aux = NULL; + + /* Go through all loops starting from innermost. */ + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) + { + /* Put loop in a list if found a conditional statement candidate in it. + This is stage for analysis, not change anything of the function. */ + if (!loop->aux + && !optimize_loop_for_size_p (loop) + && mark_cond_to_split_loop (loop)) + loop_list.safe_push (loop); + + /* If any of our inner loops was split, don't split us, + and mark our containing loop as having had splits as well. */ + loop_outer (loop)->aux = loop->aux; + } + + FOR_EACH_VEC_ELT (loop_list, i, loop) + { + /* Extract selected loop and perform loop split. This is stage for + transformation. */ + changed |= split_loop_for_cond (loop); + + delete (split_info *) loop->aux; + } + + FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT) + loop->aux = NULL; + + if (changed) + return TODO_cleanup_cfg; + return 0; +} + + /* Loop splitting pass. */ namespace { @@ -716,3 +1496,48 @@ make_pass_loop_split (gcc::context *ctxt) { return new pass_loop_split (ctxt); } + +namespace { + +const pass_data pass_data_cond_loop_split = +{ + GIMPLE_PASS, /* type */ + "cond_lsplit", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_COND_LOOP_SPLIT, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_cond_loop_split : public gimple_opt_pass +{ +public: + pass_cond_loop_split (gcc::context *ctxt) + : gimple_opt_pass (pass_data_cond_loop_split, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return flag_split_loops != 0; } + virtual unsigned int execute (function *); + +}; // class pass_cond_loop_split + +unsigned int +pass_cond_loop_split::execute (function *fun) +{ + if (number_of_loops (fun) <= 1) + return 0; + + return tree_ssa_split_loops_for_cond (); +} + +} // anon namespace + +gimple_opt_pass * +make_pass_cond_loop_split (gcc::context *ctxt) +{ + return new pass_cond_loop_split (ctxt); +}