From patchwork Thu Jun 3 03:33:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1486953 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=t4ujZfbe; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FwWhY5vm6z9sVb for ; Thu, 3 Jun 2021 13:33:48 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D4B24398A00E for ; Thu, 3 Jun 2021 03:33:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D4B24398A00E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1622691224; bh=uTqb+K3OKuBqHzBydfD2QvwG7E4mtCzh+E29H/ToF9Q=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=t4ujZfbekVwEVzb3Pk4cp6MYjNpxfN1CG6ET5MHVmxbn/q7zabJ7+I/7Fd6Dl0vAp zqskUh8jmxU8BraJq3D/mxx1eSK8GQQu7bffSOUJW+eIJKHgG8oGKxI7kItUtMnuEt mp6jTIj+A59bcqWcIEWjjiL4PGiH8gI/dbVK6E0Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 3DB103861826 for ; Thu, 3 Jun 2021 03:33:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3DB103861826 Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1533X5dt111961; Wed, 2 Jun 2021 23:33:16 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 38xq6pg21v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Jun 2021 23:33:15 -0400 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1533XFJP112294; Wed, 2 Jun 2021 23:33:15 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 38xq6pg21b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Jun 2021 23:33:15 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1533Qg1m014876; Thu, 3 Jun 2021 03:33:13 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma04fra.de.ibm.com with ESMTP id 38ud881g4c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Jun 2021 03:33:12 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1533XAvv26083754 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Jun 2021 03:33:10 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C905A4053; Thu, 3 Jun 2021 03:33:10 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B5B62A4051; Thu, 3 Jun 2021 03:33:08 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.224.27]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 3 Jun 2021 03:33:08 +0000 (GMT) Subject: [PATCH v2] predcom: Enabled by loop vect at O2 [PR100794] To: richard.sandiford@arm.com References: Message-ID: <4a817aa5-417e-fafb-9c28-52f93aa53082@linux.ibm.com> Date: Thu, 3 Jun 2021 11:33:06 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ULTT28GQedqpKL4iI6dIb-qh6l_9R3j2 X-Proofpoint-GUID: K2TjSlctu_t15Hbrof6AxwkTN7gQqCtY X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-03_01:2021-06-02, 2021-06-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 phishscore=0 mlxscore=0 suspectscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 mlxlogscore=999 bulkscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106030021 X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, MIME_CHARSET_FARAWAY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: "Kewen.Lin via Gcc-patches" , Bill Schmidt , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi Richard, on 2021/6/3 上午1:19, Richard Sandiford wrote: > "Kewen.Lin via Gcc-patches" writes: >> Hi, >> >> As PR100794 shows, in the current implementation PRE bypasses >> some optimization to avoid introducing loop carried dependence >> which stops loop vectorizer to vectorize the loop. At -O2, >> there is no downstream pass to re-catch this kind of opportunity >> if loop vectorizer fails to vectorize that loop. >> >> This patch follows Richi's suggestion in the PR, if predcom flag >> isn't set and loop vectorization will enable predcom without any >> unrolling implicitly. The Power9 SPEC2017 evaluation showed it >> can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap >> cost model, no remarkable impact at cheap cost model, the build >> time and size impact is fine (see the PR for the details). >> >> By the way, I tested another proposal to guard PRE not skip the >> optimization for cheap and very-cheap vect cost models, the >> evaluation results showed it's fine with very cheap cost model, >> but it can degrade some bmks like 521.wrf_r -9.17% and >> 549.fotonik3d_r -2.07% etc. >> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> PR tree-optimization/100794 >> * tree-predcom.c (tree_predictive_commoning_loop): Add parameter >> allow_unroll_p and only allow unrolling when it's true. >> (tree_predictive_commoning): Add parameter allow_unroll_p and >> adjust for it. >> (run_tree_predictive_commoning): Likewise. >> (class pass_predcom): Add private member allow_unroll_p. >> (pass_predcom::pass_predcom): Init allow_unroll_p. >> (pass_predcom::gate): Check flag_tree_loop_vectorize and >> global_options_set.x_flag_predictive_commoning. >> (pass_predcom::execute): Adjust for allow_unroll_p. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/100794 >> * gcc.dg/tree-ssa/pr100794.c: New test. >> >> gcc/testsuite/gcc.dg/tree-ssa/pr100794.c | 20 +++++++++ >> gcc/tree-predcom.c | 57 +++++++++++++++++------- >> 2 files changed, 60 insertions(+), 17 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr100794.c >> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c >> new file mode 100644 >> index 00000000000..6f707ae7fba >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -ftree-loop-vectorize -fdump-tree-pcom-details -fdisable-tree-vect" } */ >> + >> +extern double arr[100]; >> +extern double foo (double, double); >> +extern double sum; >> + >> +void >> +test (int i_0, int i_n) >> +{ >> + int i; >> + for (i = i_0; i < i_n - 1; i++) >> + { >> + double a = arr[i]; >> + double b = arr[i + 1]; >> + sum += a * b; >> + } >> +} >> + >> +/* { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } */ >> diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c >> index 02f911a08bb..65a93c8e505 100644 >> --- a/gcc/tree-predcom.c >> +++ b/gcc/tree-predcom.c >> @@ -3178,13 +3178,13 @@ insert_init_seqs (class loop *loop, vec chains) >> applied to this loop. */ >> >> static unsigned >> -tree_predictive_commoning_loop (class loop *loop) >> +tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) >> { >> vec datarefs; >> vec dependences; >> struct component *components; >> vec chains = vNULL; >> - unsigned unroll_factor; >> + unsigned unroll_factor = 0; >> class tree_niter_desc desc; >> bool unroll = false, loop_closed_ssa = false; >> >> @@ -3272,11 +3272,13 @@ tree_predictive_commoning_loop (class loop *loop) >> dump_chains (dump_file, chains); >> } >> >> - /* Determine the unroll factor, and if the loop should be unrolled, ensure >> - that its number of iterations is divisible by the factor. */ >> - unroll_factor = determine_unroll_factor (chains); >> - unroll = (unroll_factor > 1 >> - && can_unroll_loop_p (loop, unroll_factor, &desc)); >> + if (allow_unroll_p) >> + /* Determine the unroll factor, and if the loop should be unrolled, ensure >> + that its number of iterations is divisible by the factor. */ >> + unroll_factor = determine_unroll_factor (chains); >> + >> + if (unroll_factor > 1) >> + unroll = can_unroll_loop_p (loop, unroll_factor, &desc); >> >> /* Execute the predictive commoning transformations, and possibly unroll the >> loop. */ >> @@ -3319,7 +3321,7 @@ tree_predictive_commoning_loop (class loop *loop) >> /* Runs predictive commoning. */ >> >> unsigned >> -tree_predictive_commoning (void) >> +tree_predictive_commoning (bool allow_unroll_p) >> { >> class loop *loop; >> unsigned ret = 0, changed = 0; >> @@ -3328,7 +3330,7 @@ tree_predictive_commoning (void) >> FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST) >> if (optimize_loop_for_speed_p (loop)) >> { >> - changed |= tree_predictive_commoning_loop (loop); >> + changed |= tree_predictive_commoning_loop (loop, allow_unroll_p); >> } >> free_original_copy_tables (); >> >> @@ -3355,12 +3357,12 @@ tree_predictive_commoning (void) >> /* Predictive commoning Pass. */ >> >> static unsigned >> -run_tree_predictive_commoning (struct function *fun) >> +run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p) >> { >> if (number_of_loops (fun) <= 1) >> return 0; >> >> - return tree_predictive_commoning (); >> + return tree_predictive_commoning (allow_unroll_p); >> } >> >> namespace { >> @@ -3382,15 +3384,36 @@ class pass_predcom : public gimple_opt_pass >> { >> public: >> pass_predcom (gcc::context *ctxt) >> - : gimple_opt_pass (pass_data_predcom, ctxt) >> + : gimple_opt_pass (pass_data_predcom, ctxt), >> + allow_unroll_p (true) >> {} >> >> /* opt_pass methods: */ >> - virtual bool gate (function *) { return flag_predictive_commoning != 0; } >> - virtual unsigned int execute (function *fun) >> - { >> - return run_tree_predictive_commoning (fun); >> - } >> + virtual bool >> + gate (function *) >> + { >> + if (flag_predictive_commoning != 0) >> + return true; >> + /* Loop vectorization enables predictive commoning implicitly >> + only if predictive commoning isn't set explicitly, and it >> + doesn't allow unrolling. */ >> + if (flag_tree_loop_vectorize >> + && !global_options_set.x_flag_predictive_commoning) >> + { >> + allow_unroll_p = false; >> + return true; >> + } >> + return false; >> + } >> + >> + virtual unsigned int >> + execute (function *fun) >> + { >> + return run_tree_predictive_commoning (fun, allow_unroll_p); >> + } >> + >> +private: >> + bool allow_unroll_p; >> >> }; // class pass_predcom > > Calculating allow_unroll_p this way doesn't look robust against > changes in options caused by pragmas, etc. Would it work if we > dropped the member variable and just passed flag_predictive_commoning != 0 > to run_tree_predictive commoning? > Thanks for the comments! Yeah, it would work well. The updated version of patch has been attached. BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/100794 * tree-predcom.c (tree_predictive_commoning_loop): Add parameter allow_unroll_p and only allow unrolling when it's true. (tree_predictive_commoning): Add parameter allow_unroll_p and adjust for it. (run_tree_predictive_commoning): Likewise. (pass_predcom::gate): Check flag_tree_loop_vectorize and global_options_set.x_flag_predictive_commoning. (pass_predcom::execute): Adjust for allow_unroll_p. gcc/testsuite/ChangeLog: PR tree-optimization/100794 * gcc.dg/tree-ssa/pr100794.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c new file mode 100644 index 00000000000..6f707ae7fba --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-loop-vectorize -fdump-tree-pcom-details -fdisable-tree-vect" } */ + +extern double arr[100]; +extern double foo (double, double); +extern double sum; + +void +test (int i_0, int i_n) +{ + int i; + for (i = i_0; i < i_n - 1; i++) + { + double a = arr[i]; + double b = arr[i + 1]; + sum += a * b; + } +} + +/* { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } */ diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index 02f911a08bb..ac1674d5486 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -3178,13 +3178,13 @@ insert_init_seqs (class loop *loop, vec chains) applied to this loop. */ static unsigned -tree_predictive_commoning_loop (class loop *loop) +tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p) { vec datarefs; vec dependences; struct component *components; vec chains = vNULL; - unsigned unroll_factor; + unsigned unroll_factor = 0; class tree_niter_desc desc; bool unroll = false, loop_closed_ssa = false; @@ -3272,11 +3272,13 @@ tree_predictive_commoning_loop (class loop *loop) dump_chains (dump_file, chains); } - /* Determine the unroll factor, and if the loop should be unrolled, ensure - that its number of iterations is divisible by the factor. */ - unroll_factor = determine_unroll_factor (chains); - unroll = (unroll_factor > 1 - && can_unroll_loop_p (loop, unroll_factor, &desc)); + if (allow_unroll_p) + /* Determine the unroll factor, and if the loop should be unrolled, ensure + that its number of iterations is divisible by the factor. */ + unroll_factor = determine_unroll_factor (chains); + + if (unroll_factor > 1) + unroll = can_unroll_loop_p (loop, unroll_factor, &desc); /* Execute the predictive commoning transformations, and possibly unroll the loop. */ @@ -3319,7 +3321,7 @@ tree_predictive_commoning_loop (class loop *loop) /* Runs predictive commoning. */ unsigned -tree_predictive_commoning (void) +tree_predictive_commoning (bool allow_unroll_p) { class loop *loop; unsigned ret = 0, changed = 0; @@ -3328,7 +3330,7 @@ tree_predictive_commoning (void) FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST) if (optimize_loop_for_speed_p (loop)) { - changed |= tree_predictive_commoning_loop (loop); + changed |= tree_predictive_commoning_loop (loop, allow_unroll_p); } free_original_copy_tables (); @@ -3355,12 +3357,12 @@ tree_predictive_commoning (void) /* Predictive commoning Pass. */ static unsigned -run_tree_predictive_commoning (struct function *fun) +run_tree_predictive_commoning (struct function *fun, bool allow_unroll_p) { if (number_of_loops (fun) <= 1) return 0; - return tree_predictive_commoning (); + return tree_predictive_commoning (allow_unroll_p); } namespace { @@ -3386,11 +3388,27 @@ public: {} /* opt_pass methods: */ - virtual bool gate (function *) { return flag_predictive_commoning != 0; } - virtual unsigned int execute (function *fun) - { - return run_tree_predictive_commoning (fun); - } + virtual bool + gate (function *) + { + if (flag_predictive_commoning != 0) + return true; + /* Loop vectorization enables predictive commoning implicitly + only if predictive commoning isn't set explicitly, and it + doesn't allow unrolling. */ + if (flag_tree_loop_vectorize + && !global_options_set.x_flag_predictive_commoning) + return true; + + return false; + } + + virtual unsigned int + execute (function *fun) + { + bool allow_unroll_p = flag_predictive_commoning != 0; + return run_tree_predictive_commoning (fun, allow_unroll_p); + } }; // class pass_predcom