From patchwork Wed Feb 21 10:03:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1902036 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TfsLK6gvfz23cl for ; Wed, 21 Feb 2024 21:03:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D26C03858416 for ; Wed, 21 Feb 2024 10:03:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C8F943858D37 for ; Wed, 21 Feb 2024 10:03:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8F943858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C8F943858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708509810; cv=none; b=cNMgzsZTrOimhuec9QQ8z6eyhFAgcxydabhd91nEuEXM1M4EPeY1UCYs2H0UTH/9bDF+OIqmy6LVDn0s5UEHke+3bLHHIYMHAykb+5RfrMHRh61kuNhq50p2EoHDdLSNWdLghys0J12xMhzsY/tG/7ayRTYBYE+Udd7D6nErx4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708509810; c=relaxed/simple; bh=VPUy4QRTLZjFkZDxWNkrj9X8zHw2VSKACdsU0DSFx7E=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=rj+ig0NKnoyaL5CscYhfcQ3seK4qtfPJAyjqzX+pbj8bdDJmL7ueufkxKFIEDMKkLLurXC8JNatNyuhCJhesSgofpdDOZLYq3dDAKWXAAreBsUJW8/sqnUL9F1zmuetkunNcJtsewKdvB5VhNuF6r46UJgBOivFHq+FbKyCuk5E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CF4251FB for ; Wed, 21 Feb 2024 02:03:57 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EB8323F73F for ; Wed, 21 Feb 2024 02:03:18 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] Allow mode-switching to introduce internal loops [PR113220] Date: Wed, 21 Feb 2024 10:03:17 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-20.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org In this PR, the SME mode-switching code needs to insert a stack-probe loop for an alloca. This patch allows the target to do that. There are two parts to it: allowing loops for insertions in blocks, and allowing them for insertions on edges. The former can be handled entirely within mode-switching itself, by recording which blocks have had new branches inserted. The latter requires an extension to commit_one_edge_insertion. I think the extension to commit_one_edge_insertion makes logical sense, since it already explicitly allows internal loops during RTL expansion. The single-block find_sub_basic_blocks is a relatively recent addition, so wouldn't have been available when the code was originally written. The patch also has a small and obvious fix to make the aarch64 emit hook cope with labels. I've added specific -fstack-clash-protection versions of all aarch64-sme.exp tests that previously failed because of this bug. I've also added -fno-stack-clash-protection to the original versions of these tests if they contain scans that assume no protection. Tested on aarch64-linux-gnu. OK to install? Richard gcc/ PR target/113220 * cfgrtl.cc (commit_one_edge_insertion): Handle sequences that contain jumps even if called after initial RTL expansion. * mode-switching.cc: Include cfgbuild.h. (optimize_mode_switching): Allow the sequence returned by the emit hook to contain internal jumps. Record which blocks contain such jumps and split the blocks at the end. * config/aarch64/aarch64.cc (aarch64_mode_emit): Check for non-debug insns when scanning the sequence. gcc/testsuite/ PR target/113220 * gcc.target/aarch64/sme/call_sm_switch_5.c: Add -fno-stack-clash-protection. * gcc.target/aarch64/sme/call_sm_switch_5_scp.c: New test. * gcc.target/aarch64/sme/sibcall_6_scp.c: New test. * gcc.target/aarch64/sme/za_state_4.c: Add -fno-stack-clash-protection. * gcc.target/aarch64/sme/za_state_4_scp.c: New test. * gcc.target/aarch64/sme/za_state_5.c: Add -fno-stack-clash-protection. * gcc.target/aarch64/sme/za_state_5_scp.c: New test. --- gcc/cfgrtl.cc | 27 ++++++++++++++----- gcc/config/aarch64/aarch64.cc | 2 ++ gcc/mode-switching.cc | 15 +++++++++++ .../gcc.target/aarch64/sme/call_sm_switch_5.c | 2 +- .../aarch64/sme/call_sm_switch_5_scp.c | 3 +++ .../gcc.target/aarch64/sme/sibcall_6_scp.c | 3 +++ .../gcc.target/aarch64/sme/za_state_4.c | 2 +- .../gcc.target/aarch64/sme/za_state_4_scp.c | 3 +++ .../gcc.target/aarch64/sme/za_state_5.c | 2 +- .../gcc.target/aarch64/sme/za_state_5_scp.c | 3 +++ 10 files changed, 53 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5_scp.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_6_scp.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_4_scp.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_5_scp.c diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc index 15259c5e984..304c429c99b 100644 --- a/gcc/cfgrtl.cc +++ b/gcc/cfgrtl.cc @@ -2018,6 +2018,21 @@ commit_one_edge_insertion (edge e) insns = e->insns.r; e->insns.r = NULL; + /* Allow the sequence to contain internal jumps, such as a memcpy loop + or an allocation loop. If such a sequence is emitted during RTL + expansion, we'll create the appropriate basic blocks later, + at the end of the pass. But if such a sequence is emitted after + initial expansion, we'll need to find the subblocks ourselves. */ + bool contains_jump = false; + if (!currently_expanding_to_rtl) + for (rtx_insn *insn = insns; insn; insn = NEXT_INSN (insn)) + if (JUMP_P (insn)) + { + rebuild_jump_labels_chain (insns); + contains_jump = true; + break; + } + /* Figure out where to put these insns. If the destination has one predecessor, insert there. Except for the exit block. */ if (single_pred_p (e->dest) && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)) @@ -2112,13 +2127,13 @@ commit_one_edge_insertion (edge e) delete_insn (before); } else - /* Some builtin expanders, such as those for memset and memcpy, - may generate loops and conditionals, and those may get emitted - into edges. That's ok while expanding to rtl, basic block - boundaries will be identified and split afterwards. ??? Need - we check whether the destination labels of any inserted jumps - are also part of the inserted sequence? */ + /* Sequences inserted after RTL expansion are expected to be SESE, + with only internal branches allowed. If the sequence jumps outside + itself then we do not know how to add the associated edges here. */ gcc_assert (!JUMP_P (last) || currently_expanding_to_rtl); + + if (contains_jump) + find_sub_basic_blocks (bb); } /* Update the CFG for all queued instructions. */ diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 76f1230d81a..104f7e1831e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -29513,6 +29513,8 @@ aarch64_mode_emit (int entity, int mode, int prev_mode, HARD_REG_SET live) HARD_REG_SET clobbers = {}; for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn)) { + if (!NONDEBUG_INSN_P (insn)) + continue; vec_rtx_properties properties; properties.add_insn (insn, false); for (rtx_obj_reference ref : properties.refs ()) diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc index 01754e9eed4..583929184ce 100644 --- a/gcc/mode-switching.cc +++ b/gcc/mode-switching.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "lcm.h" #include "cfgcleanup.h" #include "tree-pass.h" +#include "cfgbuild.h" /* We want target macros for the mode switching code to be able to refer to instruction attribute values. */ @@ -1105,6 +1106,8 @@ optimize_mode_switching (void) edge_list = pre_edge_lcm_avs (n_entities * max_num_modes, transp, comp, antic, kill, avin, avout, &insert, &del); + auto_sbitmap jumping_blocks (last_basic_block_for_fn (cfun)); + bitmap_clear (jumping_blocks); for (j = n_entities - 1; j >= 0; j--) { int no_mode = num_modes[entity_map[j]]; @@ -1215,6 +1218,13 @@ optimize_mode_switching (void) /* Insert MODE_SET only if it is nonempty. */ if (mode_set != NULL_RTX) { + for (auto insn = mode_set; insn; insn = NEXT_INSN (insn)) + if (JUMP_P (insn)) + { + rebuild_jump_labels_chain (mode_set); + bitmap_set_bit (jumping_blocks, bb->index); + break; + } emitted = true; if (NOTE_INSN_BASIC_BLOCK_P (ptr->insn_ptr)) /* We need to emit the insns in a FIFO-like manner, @@ -1252,6 +1262,11 @@ optimize_mode_switching (void) sbitmap_vector_free (avin); sbitmap_vector_free (avout); + gcc_assert (SBITMAP_SIZE ((sbitmap) jumping_blocks) + == (unsigned int) last_basic_block_for_fn (cfun)); + if (!bitmap_empty_p (jumping_blocks)) + find_many_sub_basic_blocks (jumping_blocks); + if (need_commit) commit_edge_insertions (); diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c index 6238ab80da2..d31b6b91f1f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c @@ -1,4 +1,4 @@ -// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -funwind-tables" } +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -funwind-tables -fno-stack-clash-protection" } // { dg-final { check-function-bodies "**" "" } } #include diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5_scp.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5_scp.c new file mode 100644 index 00000000000..09c64464907 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5_scp.c @@ -0,0 +1,3 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -funwind-tables" } + +#include "call_sm_switch_5.c" diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6_scp.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6_scp.c new file mode 100644 index 00000000000..4f2e5746ecc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6_scp.c @@ -0,0 +1,3 @@ +/* { dg-options "-O2 -fstack-clash-protection" } */ + +#include "sibcall_6.c" diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c index cec0abf0ea9..f27470bb8a0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c @@ -1,4 +1,4 @@ -// { dg-options "-O -fno-optimize-sibling-calls" } +// { dg-options "-O -fno-optimize-sibling-calls -fno-stack-clash-protection" } // { dg-final { check-function-bodies "**" "" } } void private_za(); diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_4_scp.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4_scp.c new file mode 100644 index 00000000000..1620f881d36 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4_scp.c @@ -0,0 +1,3 @@ +// { dg-options "-O -fno-optimize-sibling-calls -fstack-clash-protection" } + +#include "za_state_4.c" diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c index d54840d3d77..52c96b3c32c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c @@ -1,4 +1,4 @@ -// { dg-options "-O2 -fno-optimize-sibling-calls" } +// { dg-options "-O2 -fno-optimize-sibling-calls -fno-stack-clash-protection" } // { dg-final { check-function-bodies "**" "" } } void private_za(); diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_5_scp.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5_scp.c new file mode 100644 index 00000000000..8b4d6e12949 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5_scp.c @@ -0,0 +1,3 @@ +// { dg-options "-O -fno-optimize-sibling-calls -fstack-clash-protection" } + +#include "za_state_5.c"