From patchwork Fri Jun 4 08:04:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1487656 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=TocJl2T1; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FxFg41y44z9sRf for ; Fri, 4 Jun 2021 18:05:02 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 043373890431 for ; Fri, 4 Jun 2021 08:05:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 043373890431 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1622793900; bh=k61Vq1FSRX6TyPTALrCGdSav8yjyaqvaDmn93+j3CBA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=TocJl2T1/H21a7XnfGZV2QabG7b+/wfsNHriMU9y5aodJlFSFN3rVvSXOqAc902Es FTTazxGfGqj4CXiyDtCynRNwjluT7lBFDf9V2EPpJDCzrOqhg9CmD4q7NgFt93YfuO EQlScmXFJ2PKlmhZg+Q3QQjHvhT0zWLkMWPEqpJM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id DCD403851C26 for ; Fri, 4 Jun 2021 08:04:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DCD403851C26 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15483X0J021674; Fri, 4 Jun 2021 04:04:27 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 38yg35rcgd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Jun 2021 04:04:27 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 154840vs024803; Fri, 4 Jun 2021 04:04:27 -0400 Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com with ESMTP id 38yg35rcfn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Jun 2021 04:04:27 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 154803ZB027573; Fri, 4 Jun 2021 08:04:24 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma01fra.de.ibm.com with ESMTP id 38ud889w92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 04 Jun 2021 08:04:24 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 15484La331981916 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 4 Jun 2021 08:04:21 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8A911AE053; Fri, 4 Jun 2021 08:04:21 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7BF82AE045; Fri, 4 Jun 2021 08:04:20 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 4 Jun 2021 08:04:20 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH V3] Split loop for NE condition. Date: Fri, 4 Jun 2021 16:04:19 +0800 Message-Id: <20210604080419.119328-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tgNgGAGIMVwElZBmna94zMWKwmMPpLya X-Proofpoint-ORIG-GUID: o6kktJh9lh6v2NaBvibt2FiQY54ruZwc X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-04_05:2021-06-04, 2021-06-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 clxscore=1011 impostorscore=0 malwarescore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 adultscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106040064 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Cc: rguenther@suse.de, segher@kernel.crashing.org, jlaw@tachyum.com, wschmidt@linux.ibm.com, dje.gcc@gmail.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Update the patch since v2: . Check index and bound from gcond before checking if wrap. . Update test case, and add an executable case. . Refine code comments. . Enhance the checking for i++/++i in the loop header. . Enhance code to handle equal condition on exit Bootstrap and regtest pass on powerpc64le, and also pass regtest on bootstrap-O3. Is this ok for trunk? BR. Jiufu Guo. When there is the possibility that wrap may happen on the loop index, a few optimizations would not happen. For example code: foo (int *a, int *b, unsigned k, unsigned n) { while (++k != n) a[k] = b[k] + 1; } For this code, if "k > n", k would wrap. if "k < n" at begining, it could be optimized (e.g. vectorization). We can split the loop into two loops: while (++k > n) a[k] = b[k] + 1; while (k++ < n) a[k] = b[k] + 1; This patch splits this kind of loop to achieve better performance. gcc/ChangeLog: 2021-06-04 Jiufu Guo * tree-ssa-loop-split.c (connect_loop_phis): Add new param. (get_ne_cond_branch): New function. (split_ne_loop): New function. (split_loop_on_ne_cond): New function. (tree_ssa_split_loops): Use split_loop_on_ne_cond. gcc/testsuite/ChangeLog: 2021-06-04 Jiufu Guo * gcc.dg/loop-split1.c: New test. * gcc.dg/loop-split2.c: New test. * g++.dg/vect/pr98064.cc: Suppress warning. --- gcc/testsuite/g++.dg/vect/pr98064.cc | 4 +- gcc/testsuite/gcc.dg/loop-split1.c | 101 +++++++++++ gcc/testsuite/gcc.dg/loop-split2.c | 54 ++++++ gcc/tree-ssa-loop-split.c | 251 ++++++++++++++++++++++++++- 4 files changed, 404 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/loop-split1.c create mode 100644 gcc/testsuite/gcc.dg/loop-split2.c diff --git a/gcc/testsuite/g++.dg/vect/pr98064.cc b/gcc/testsuite/g++.dg/vect/pr98064.cc index 74043ce7725..dcb2985d05a 100644 --- a/gcc/testsuite/g++.dg/vect/pr98064.cc +++ b/gcc/testsuite/g++.dg/vect/pr98064.cc @@ -1,5 +1,7 @@ // { dg-do compile } -// { dg-additional-options "-O3" } +// { dg-additional-options "-O3 -Wno-stringop-overflow" } +/* There is warning message when "short g = var_8; g; g++" + is optimized/analyzed as string operation,e.g. memset. */ const long long &min(const long long &__a, long long &__b) { if (__b < __a) diff --git a/gcc/testsuite/gcc.dg/loop-split1.c b/gcc/testsuite/gcc.dg/loop-split1.c new file mode 100644 index 00000000000..dd2d03a7b96 --- /dev/null +++ b/gcc/testsuite/gcc.dg/loop-split1.c @@ -0,0 +1,101 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */ + +void +foo (int *a, int *b, unsigned l, unsigned n) +{ + while (++l != n) + a[l] = b[l] + 1; +} +void +foo_1 (int *a, int *b, unsigned n) +{ + unsigned l = 0; + while (++l != n) + a[l] = b[l] + 1; +} + +void +foo1 (int *a, int *b, unsigned l, unsigned n) +{ + while (l++ != n) + a[l] = b[l] + 1; +} + +/* No wrap. */ +void +foo1_1 (int *a, int *b, unsigned n) +{ + unsigned l = 0; + while (l++ != n) + a[l] = b[l] + 1; +} + +unsigned +foo2 (char *a, char *b, unsigned l, unsigned n) +{ + while (++l != n) + if (a[l] != b[l]) + break; + + return l; +} + +unsigned +foo2_1 (char *a, char *b, unsigned l, unsigned n) +{ + l = 0; + while (++l != n) + if (a[l] != b[l]) + break; + + return l; +} + +unsigned +foo3 (char *a, char *b, unsigned l, unsigned n) +{ + while (l++ != n) + if (a[l] != b[l]) + break; + + return l; +} + +/* No wrap. */ +unsigned +foo3_1 (char *a, char *b, unsigned l, unsigned n) +{ + l = 0; + while (l++ != n) + if (a[l] != b[l]) + break; + + return l; +} + +void +bar (); +void +foo4 (unsigned n, unsigned i) +{ + do + { + if (i == n) + return; + bar (); + ++i; + } + while (1); +} + +unsigned +find_skip_diff (char *p, char *q, unsigned n, unsigned i) +{ + while (p[i] == q[i] && ++i != n) + p++, q++; + + return i; +} + +/* { dg-final { scan-tree-dump-times "Loop split" 8 "lsplit" } } */ diff --git a/gcc/testsuite/gcc.dg/loop-split2.c b/gcc/testsuite/gcc.dg/loop-split2.c new file mode 100644 index 00000000000..0d3fded3f61 --- /dev/null +++ b/gcc/testsuite/gcc.dg/loop-split2.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ + +extern void abort (void); +extern void exit (int); + +#define NI __attribute__ ((noinline)) + +void NI +foo (int *a, int *b, unsigned char l, unsigned char n) +{ + while (++l != n) + a[l] = b[l] + 1; +} + +unsigned NI +bar (int *a, int *b, unsigned char l, unsigned char n) +{ + while (l++ != n) + if (a[l] != b[l]) + break; + + return l; +} + +int a[258]; +int b[258]; + +int main() +{ + __builtin_memcpy (b, a, sizeof (a)); + + if (bar (a, b, 3, 8) != 9) + abort (); + + if (bar (a, b, 8, 3) != 4) + abort (); + + b[100] += 1; + if (bar (a, b, 90, 110) != 100) + abort (); + + if (bar (a, b, 110, 105) != 100) + abort (); + + foo (a, b, 99, 99); + a[99] = b[99] + 1; + for (int i = 0; i < 256; i++) + if (a[i] != b[i] + 1) + abort(); + + exit (0); +} + diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index 3a09bbc39e5..9c0de66e288 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "cfghooks.h" #include "gimple-fold.h" #include "gimplify-me.h" +#include "tree-ssa-loop-ivopts.h" /* This file implements two kinds of loop splitting. @@ -229,11 +230,14 @@ easy_exit_values (class loop *loop) conditional). I.e. the second loop can now be entered either via the original entry or via NEW_E, so the entry values of LOOP2 phi nodes are either the original ones or those at the exit - of LOOP1. Insert new phi nodes in LOOP2 pre-header reflecting - this. The loops need to fulfill easy_exit_values(). */ + of LOOP1. Selecting the previous value instead next value as the + exit value of LOOP1 if USE_PREV is true. Insert new phi nodes in + LOOP2 pre-header reflecting this. The loops need to fulfill + easy_exit_values(). */ static void -connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e) +connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e, + bool use_prev = false) { basic_block rest = loop_preheader_edge (loop2)->src; gcc_assert (new_e->dest == rest); @@ -279,7 +283,8 @@ connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e) gphi * newphi = create_phi_node (new_init, rest); add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION); - add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION); + add_phi_arg (newphi, use_prev ? PHI_RESULT (phi_first) : next, new_e, + UNKNOWN_LOCATION); SET_USE (op, new_init); } } @@ -1593,6 +1598,241 @@ split_loop_on_cond (struct loop *loop) return do_split; } +/* Check if the LOOP exit branch is like "if (idx != bound)", + Return the branch edge which exit loop, if wrap may happen on "idx". */ + +static edge +get_ne_cond_branch (struct loop *loop) +{ + int i; + edge e; + + auto_vec edges = get_loop_exit_edges (loop); + FOR_EACH_VEC_ELT (edges, i, e) + { + basic_block bb = e->src; + + /* Check if exit at gcond. */ + gimple *last = last_stmt (bb); + if (!last || gimple_code (last) != GIMPLE_COND) + continue; + gcond *cond = as_a (last); + enum tree_code code = gimple_cond_code (cond); + if (!(code == NE_EXPR + || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE)))) + continue; + + /* Check if bound is invarant. */ + tree idx = gimple_cond_lhs (cond); + tree bnd = gimple_cond_rhs (cond); + if (expr_invariant_in_loop_p (loop, idx)) + std::swap (idx, bnd); + else if (!expr_invariant_in_loop_p (loop, bnd)) + continue; + + /* Only unsigned type conversion could cause wrap. */ + tree type = TREE_TYPE (idx); + if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME + || !TYPE_UNSIGNED (type)) + continue; + + /* Avoid to split if bound is MAX/MIN val. */ + tree bound_type = TREE_TYPE (bnd); + if (TREE_CODE (bnd) == INTEGER_CST && INTEGRAL_TYPE_P (bound_type) + && (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type)) + || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))) + continue; + + /* Check if there is possible wrap. */ + class tree_niter_desc niter; + if (!number_of_iterations_exit (loop, e, &niter, false, false)) + continue; + if (niter.control.no_overflow) + return NULL; + if (niter.cmp != NE_EXPR) + continue; + + /* If exit edge is just before the empty latch, it is easy to link + the split loops: just jump from the exit edge of one loop to the + header of new loop. */ + if (single_pred_p (loop->latch) + && single_pred_edge (loop->latch)->src == bb + && empty_block_p (loop->latch)) + return e; + + /* If exit edge is at end of header, and header contains i++ or ++i, + only, it is simple to link the split loops: jump from the end of + one loop header to the new loop header, and use unchanged PHI + result of first loop as the entry PHI value of the second loop. */ + if (bb == loop->header) + { + /* Only one phi. */ + gphi_iterator psi = gsi_start_phis (bb); + if (gsi_end_p (psi)) + continue; + gphi *phi = psi.phi (); + gsi_next (&psi); + if (!gsi_end_p (psi)) + continue; + + /* Check it is ++i or ++i */ + tree next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop)); + tree prev = PHI_RESULT (phi); + if (idx != prev && idx != next) + continue; + + gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb); + if (gsi_end_p (gsi)) + continue; + gimple *s1 = gsi_stmt (gsi); + if (!is_gimple_assign (s1) || gimple_assign_lhs (s1) != next + || gimple_assign_rhs1 (s1) != prev) + continue; + + gsi_next_nondebug (&gsi); + if (!gsi_end_p (gsi) && gsi_stmt (gsi) == cond) + return e; + } + } + + return NULL; +} + +/* Split the LOOP with NE_EXPR into two loops with GT_EXPR and LT_EXPR. */ + +static bool +split_ne_loop (struct loop *loop, edge cond_e) +{ + initialize_original_copy_tables (); + + struct loop *loop2 = loop_version (loop, boolean_true_node, NULL, + profile_probability::always (), + profile_probability::never (), + profile_probability::always (), + profile_probability::always (), true); + + gcc_assert (loop2); + update_ssa (TODO_update_ssa); + + basic_block loop2_cond_exit_bb = get_bb_copy (cond_e->src); + free_original_copy_tables (); + + gcond *gc = as_a (last_stmt (cond_e->src)); + gcond *dup_gc = as_a (last_stmt (loop2_cond_exit_bb)); + + /* Invert edges for gcond. */ + if (gimple_cond_code (gc) == EQ_EXPR) + { + auto invert_edge = [](basic_block bb) { + edge out = EDGE_SUCC (bb, 0); + edge in = EDGE_SUCC (bb, 1); + if (in->flags & EDGE_TRUE_VALUE) + std::swap (in, out); + in->flags |= EDGE_TRUE_VALUE; + in->flags &= ~EDGE_FALSE_VALUE; + out->flags |= EDGE_FALSE_VALUE; + out->flags &= ~EDGE_TRUE_VALUE; + }; + + invert_edge (gimple_bb (gc)); + invert_edge (gimple_bb (dup_gc)); + } + + /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */ + bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc)); + enum tree_code up_code = inv ? LT_EXPR : GT_EXPR; + enum tree_code down_code = inv ? GT_EXPR : LT_EXPR; + + gimple_cond_set_code (gc, up_code); + gimple_cond_set_code (dup_gc, down_code); + + /* Link the exit cond edge to new loop. */ + gcond *break_cond = as_a (gimple_copy (gc)); + edge pred_e = single_pred_edge (loop->latch); + bool simple_loop + = pred_e && pred_e->src == cond_e->src && empty_block_p (loop->latch); + if (simple_loop) + gimple_cond_set_code (break_cond, down_code); + else + gimple_cond_make_true (break_cond); + + basic_block break_bb = split_edge (cond_e); + gimple_stmt_iterator gsi = gsi_last_bb (break_bb); + gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT); + + edge to_exit = single_succ_edge (break_bb); + edge to_new_loop = make_edge (break_bb, loop_preheader_edge (loop2)->src, 0); + to_new_loop->flags |= EDGE_TRUE_VALUE; + to_exit->flags |= EDGE_FALSE_VALUE; + to_exit->flags &= ~EDGE_FALLTHRU; + to_exit->probability = cond_e->probability; + to_new_loop->probability = to_exit->probability.invert (); + + update_ssa (TODO_update_ssa); + + connect_loop_phis (loop, loop2, to_new_loop, !simple_loop); + + rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; Loop split on wrap index.\n"); + + return true; +} + +/* Checks if LOOP contains a suitable NE_EXPR conditional block to split. +L_H: + if (i!=N) + S; + else + goto EXIT; + i++; + goto L_H; + +The "i!=N" is like "i>N || iN) + S; + else + goto EXIT; + i++; + goto L_H; +L1_H: + if (inum_nodes)) + { + free (bbs); + return false; + } + + for (unsigned i = 0; i < loop->num_nodes; i++) + num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights); + free (bbs); + + if (num > param_max_peeled_insns) + return false; + + edge branch_edge = get_ne_cond_branch (loop); + if (branch_edge && split_ne_loop (loop, branch_edge)) + return true; + + return false; +} + /* Main entry point. Perform loop splitting on all suitable loops. */ static unsigned int @@ -1622,7 +1862,8 @@ tree_ssa_split_loops (void) if (optimize_loop_for_size_p (loop)) continue; - if (split_loop (loop) || split_loop_on_cond (loop)) + if (split_loop (loop) || split_loop_on_cond (loop) + || split_loop_on_ne_cond (loop)) { /* Mark our containing loop as having had some split inner loops. */ loop_outer (loop)->aux = loop;