From patchwork Sun Nov 29 19:24:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Zhuykov X-Patchwork-Id: 1407876 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=oo0xMzIf; dkim-atps=neutral Received: from sourceware.org (unknown [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Ckdc657CRz9sVC for ; Mon, 30 Nov 2020 06:25:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 23E1D3858009; Sun, 29 Nov 2020 19:25:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 23E1D3858009 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1606677901; bh=qD5Uxpis+TQqvyGTWLbO7k3BI1GV+yW24QFlgXN9qQw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=oo0xMzIfTR+7DGB0fpPiBis2zge4gS2Ihy0A3I5Dr8SUH8thDIYZgBCixE7YVCdBe KmHLXEETse089la9dLMV8Soo2hR67PkDpOsp3HG601wpy13d4ArhjKShqyxuUhcAVa 3GXaODGzggos7g/+dutHfwFYAG2Ldkkm8DObTRmk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 904723858009; Sun, 29 Nov 2020 19:24:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 904723858009 Received: from [10.10.3.54] (unknown [10.10.3.54]) by mail.ispras.ru (Postfix) with ESMTP id 59F8F40D403E; Sun, 29 Nov 2020 19:24:38 +0000 (UTC) To: "gcc-patches@gcc.gnu.org" Subject: [PATCH] modulo-sched: Carefully process loop counter initialization [PR97421] Message-ID: <2c442c46-2f0c-e201-d456-65910a9e896d@ispras.ru> Date: Sun, 29 Nov 2020 22:24:38 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Roman Zhuykov via Gcc-patches From: Roman Zhuykov Reply-To: Roman Zhuykov Cc: Jakub Jelinek , Alexander Monakov , Richard Biener , Alex Coplan Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi all! Same patch attached with commit message and inlined below. It was successfully reg-strapped on aarch64-linux. Planning also to briefly check amd64 build before push. Pushing in a few days if no objections. Any opinion about backports? Roman --- modulo-sched: Carefully process loop counter initialization [PR97421] Do not allow direct adjustment of pre-header initialization instruction for count register if is read in some instruction below in that basic block. gcc/ChangeLog: PR rtl-optimization/97421 * modulo-sched.c (generate_prolog_epilog): Remove forward declaration, adjust last argument name and type. (const_iteration_count): Add bool pointer parameter to return whether count register is read in pre-header after its initialization. (sms_schedule): Fix count register initialization adjustment procedure according to what const_iteration_count said. gcc/testsuite/ChangeLog: PR rtl-optimization/97421 * gcc.c-torture/execute/pr97421-1.c: New test. * gcc.c-torture/execute/pr97421-2.c: New test. * gcc.c-torture/execute/pr97421-3.c: New test. modulo-sched: Carefully process loop counter initialization [PR97421] Do not allow direct adjustment of pre-header initialization instruction for count register if is read in some instruction below in that basic block. gcc/ChangeLog: PR rtl-optimization/97421 * modulo-sched.c (generate_prolog_epilog): Remove forward declaration, adjust last argument name and type. (const_iteration_count): Add bool pointer parameter to return whether count register is read in pre-header after its initialization. (sms_schedule): Fix count register initialization adjustment procedure according to what const_iteration_count said. gcc/testsuite/ChangeLog: PR rtl-optimization/97421 * gcc.c-torture/execute/pr97421-1.c: New test. * gcc.c-torture/execute/pr97421-2.c: New test. * gcc.c-torture/execute/pr97421-3.c: New test. diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c index 6f699a874e..4568674aa6 100644 --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -210,8 +210,6 @@ static int sms_order_nodes (ddg_ptr, int, int *, int *); static void set_node_sched_params (ddg_ptr); static partial_schedule_ptr sms_schedule_by_order (ddg_ptr, int, int, int *); static void permute_partial_schedule (partial_schedule_ptr, rtx_insn *); -static void generate_prolog_epilog (partial_schedule_ptr, class loop *, - rtx, rtx); static int calculate_stage_count (partial_schedule_ptr, int); static void calculate_must_precede_follow (ddg_node_ptr, int, int, int, int, sbitmap, sbitmap, sbitmap); @@ -391,30 +389,40 @@ doloop_register_get (rtx_insn *head, rtx_insn *tail) this constant. Otherwise return 0. */ static rtx_insn * const_iteration_count (rtx count_reg, basic_block pre_header, - int64_t * count) + int64_t *count, bool* adjust_inplace) { rtx_insn *insn; rtx_insn *head, *tail; + *adjust_inplace = false; + bool read_after = false; + if (! pre_header) return NULL; get_ebb_head_tail (pre_header, pre_header, &head, &tail); for (insn = tail; insn != PREV_INSN (head); insn = PREV_INSN (insn)) - if (NONDEBUG_INSN_P (insn) && single_set (insn) && - rtx_equal_p (count_reg, SET_DEST (single_set (insn)))) + if (single_set (insn) && rtx_equal_p (count_reg, + SET_DEST (single_set (insn)))) { rtx pat = single_set (insn); if (CONST_INT_P (SET_SRC (pat))) { *count = INTVAL (SET_SRC (pat)); + *adjust_inplace = !read_after; return insn; } return NULL; } + else if (NONDEBUG_INSN_P (insn) && reg_mentioned_p (count_reg, insn)) + { + read_after = true; + if (reg_set_p (count_reg, insn)) + break; + } return NULL; } @@ -1126,7 +1134,7 @@ duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage, /* Generate the instructions (including reg_moves) for prolog & epilog. */ static void generate_prolog_epilog (partial_schedule_ptr ps, class loop *loop, - rtx count_reg, rtx count_init) + rtx count_reg, bool adjust_init) { int i; int last_stage = PS_STAGE_COUNT (ps) - 1; @@ -1135,12 +1143,12 @@ generate_prolog_epilog (partial_schedule_ptr ps, class loop *loop, /* Generate the prolog, inserting its insns on the loop-entry edge. */ start_sequence (); - if (!count_init) + if (adjust_init) { /* Generate instructions at the beginning of the prolog to - adjust the loop count by STAGE_COUNT. If loop count is constant - (count_init), this constant is adjusted by STAGE_COUNT in - generate_prolog_epilog function. */ + adjust the loop count by STAGE_COUNT. If loop count is constant + and it not used anywhere in prologue, this constant is adjusted by + STAGE_COUNT outside of generate_prolog_epilog function. */ rtx sub_reg = NULL_RTX; sub_reg = expand_simple_binop (GET_MODE (count_reg), MINUS, count_reg, @@ -1528,7 +1536,8 @@ sms_schedule (void) rtx_insn *count_init; int mii, rec_mii, stage_count, min_cycle; int64_t loop_count = 0; - bool opt_sc_p; + bool opt_sc_p, adjust_inplace = false; + basic_block pre_header; if (! (g = g_arr[loop->num])) continue; @@ -1569,19 +1578,13 @@ sms_schedule (void) } - /* In case of th loop have doloop register it gets special - handling. */ - count_init = NULL; - if ((count_reg = doloop_register_get (head, tail))) - { - basic_block pre_header; - - pre_header = loop_preheader_edge (loop)->src; - count_init = const_iteration_count (count_reg, pre_header, - &loop_count); - } + count_reg = doloop_register_get (head, tail); gcc_assert (count_reg); + pre_header = loop_preheader_edge (loop)->src; + count_init = const_iteration_count (count_reg, pre_header, &loop_count, + &adjust_inplace); + if (dump_file && count_init) { fprintf (dump_file, "SMS const-doloop "); @@ -1701,9 +1704,20 @@ sms_schedule (void) print_partial_schedule (ps, dump_file); } - /* case the BCT count is not known , Do loop-versioning */ - if (count_reg && ! count_init) + if (count_init) + { + if (adjust_inplace) + { + /* When possible, set new iteration count of loop kernel in + place. Otherwise, generate_prolog_epilog creates an insn + to adjust. */ + SET_SRC (single_set (count_init)) = GEN_INT (loop_count + - stage_count + 1); + } + } + else { + /* case the BCT count is not known , Do loop-versioning */ rtx comp_rtx = gen_rtx_GT (VOIDmode, count_reg, gen_int_mode (stage_count, GET_MODE (count_reg))); @@ -1713,12 +1727,7 @@ sms_schedule (void) loop_version (loop, comp_rtx, &condition_bb, prob, prob.invert (), prob, prob.invert (), true); - } - - /* Set new iteration count of loop kernel. */ - if (count_reg && count_init) - SET_SRC (single_set (count_init)) = GEN_INT (loop_count - - stage_count + 1); + } /* Now apply the scheduled kernel to the RTL of the loop. */ permute_partial_schedule (ps, g->closing_branch->first_note); @@ -1735,7 +1744,7 @@ sms_schedule (void) if (dump_file) print_node_sched_params (dump_file, g->num_nodes, ps); /* Generate prolog and epilog. */ - generate_prolog_epilog (ps, loop, count_reg, count_init); + generate_prolog_epilog (ps, loop, count_reg, !adjust_inplace); break; } diff --git a/gcc/testsuite/gcc.c-torture/execute/pr97421-1.c b/gcc/testsuite/gcc.c-torture/execute/pr97421-1.c new file mode 100644 index 0000000000..e32fb129f1 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr97421-1.c @@ -0,0 +1,23 @@ +/* PR rtl-optimization/97421 */ +/* { dg-additional-options "-fmodulo-sched" } */ + +int a, b, d, e; +int *volatile c = &a; + +__attribute__((noinline)) +void f(void) +{ + for (int g = 2; g >= 0; g--) { + d = 0; + for (b = 0; b <= 2; b++) + ; + e = *c; + } +} + +int main(void) +{ + f(); + if (b != 3) + __builtin_abort(); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr97421-2.c b/gcc/testsuite/gcc.c-torture/execute/pr97421-2.c new file mode 100644 index 0000000000..142bcbcee9 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr97421-2.c @@ -0,0 +1,18 @@ +/* PR rtl-optimization/97421 */ +/* { dg-additional-options "-fmodulo-sched -fno-dce -fno-strict-aliasing" } */ + +static int a, b, c; +int *d = &c; +int **e = &d; +int ***f = &e; +int main() +{ + int h; + for (a = 2; a; a--) + for (h = 0; h <= 2; h++) + for (b = 0; b <= 2; b++) + ***f = 6; + + if (b != 3) + __builtin_abort(); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr97421-3.c b/gcc/testsuite/gcc.c-torture/execute/pr97421-3.c new file mode 100644 index 0000000000..3f1485a4a3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr97421-3.c @@ -0,0 +1,22 @@ +/* PR rtl-optimization/97421 */ +/* { dg-additional-options "-fmodulo-sched" } */ + +int a, b, c; +short d; +void e(void) { + unsigned f = 0; + for (; f <= 2; f++) { + int g[1]; + int h = (long)g; + c = 0; + for (; c < 10; c++) + g[0] = a = 0; + for (; a <= 2; a++) + b = d; + } +} +int main(void) { + e(); + if (a != 3) + __builtin_abort(); +}