From patchwork Thu May 14 19:53:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1290656 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49NMff37hsz9sRK for ; Fri, 15 May 2020 05:53:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E0B0386F430; Thu, 14 May 2020 19:53:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 033A83851C39 for ; Thu, 14 May 2020 19:53:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 033A83851C39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Andrew_Stubbs@mentor.com IronPort-SDR: sMgwddUOtj0MDyLb4NHbqcTU4uGCElJu+Wu4mW/hLlgpdMCgQJ81mFv6WjrsN5Dq0vwrJLpZBa fwUjZUFaqF48QisJ5f8FRmR6TOivlaMPACZW3/Ga+kt/fZzQWGElC6/ltHqcsv8ieE45AdhlHR wp0AX8xV7YY245KKWAKeall2YRYGy6CsTqKkl1WKg8oCH2oOmcePcb9a4K00AlvZ1wlO8v7vTX 1smN2ibQmAVyyBwhfa7LbzSwJElTGl9m7U9bkccv9e3m4pBjt6ITHBXuBjvuZn/LbsU/LhkJIN eIA= X-IronPort-AV: E=Sophos;i="5.73,392,1583222400"; d="scan'208";a="48917604" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa4.mentor.iphmx.com with ESMTP; 14 May 2020 11:53:12 -0800 IronPort-SDR: t+xBywxXGCfEu3TZuw37naofO5CgDFjUxKVMgJXGWzTUEHJyT9YoGVr9vykf0p6HnId4PX4xQQ b7uhbtdfw51ZOgg2QNl+EEqw0r7t+3H6KWQqZL382fEpVjX9JGSw6zqIvHCCfrPH7PTOKdB7b9 uRbNISvIkxdxo3FwTW/q7yUDtqmXDgPHRGpFuunpL/jgZl6bVOZ0oTK7DgsMJz2w1FpQq+jR1S JQ5JuuWQ7Pi6yQrgEDjCQwkwAzcVmcynR9MdfW0mQv/WR4LBDkMANhyHIr9sw8x9gx9X2Cmdpg 6gg= From: Andrew Stubbs Subject: [committed] amdgcn: fix vcc clobber in vector load/store To: "gcc-patches@gcc.gnu.org" Message-ID: Date: Thu, 14 May 2020 20:53:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 Content-Language: en-GB X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-16.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This fixes a wrong-code error that could occur when a vector reload was inserted between a vector compare and conditional branch. The problem was that expanding the vector base address to the vector of addresses needed by the ISA would clobber the VCC register. This fine before LRA, but not good after. The fix clobbers CC_SAVE_REG instead, which is never long-lived. Andrew amdgcn: fix vcc clobber in vector load/store This switches the code that expands scalar addresses to vectors of addresses from using VCC to using CC_SAVE_REG, for the lo-part to hi-part carry values. These were fine in code expanded in earlier passes, but addresses expanded late, such as for stack spills or reloads, could clobber live VCC values, causing execution failures. This is the first target-specific testcase for GCN, so the new .exp file is included. 2020-05-14 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md (add3_zext_dup): Change to a define_expand, and rename the original to ... (add3_vcc_zext_dup): ... this, and add a custom VCC operand. (add3_zext_dup_exec): Likewise, with ... (add3_vcc_zext_dup_exec): ... this. (add3_zext_dup2): Likewise, with ... (add3_zext_dup_exec): ... this. (add3_zext_dup2_exec): Likewise, with ... (add3_zext_dup2): ... this. * config/gcn/gcn.c (gcn_expand_scalar_to_vector_address): Switch addv64di3_zext* calls to use addv64di3_vcc_zext*. gcc/testsuite/ * testsuite/gcc.target/gcn/gcn.exp: New file. * testsuite/gcc.target/gcn/vcc-clobber.c: New file. diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index d3badb4059c..a43d6b6c6f3 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -1379,135 +1379,206 @@ [(set_attr "type" "vmult") (set_attr "length" "8")]) -(define_insn_and_split "add3_zext_dup" - [(set (match_operand:V_DI 0 "register_operand" "= v, v") +(define_insn_and_split "add3_vcc_zext_dup" + [(set (match_operand:V_DI 0 "register_operand" "= v, v") (plus:V_DI (zero_extend:V_DI (vec_duplicate: - (match_operand:SI 1 "gcn_alu_operand" "BSv,ASv"))) - (match_operand:V_DI 2 "gcn_alu_operand" "vDA,vDb"))) - (clobber (reg:DI VCC_REG))] + (match_operand:SI 1 "gcn_alu_operand" " BSv, ASv"))) + (match_operand:V_DI 2 "gcn_alu_operand" " vDA, vDb"))) + (set (match_operand:DI 3 "register_operand" "=SgcV,SgcV") + (ltu:DI (plus:V_DI + (zero_extend:V_DI (vec_duplicate: (match_dup 1))) + (match_dup 2)) + (match_dup 1)))] "" "#" "gcn_can_split_p (mode, operands[0]) && gcn_can_split_p (mode, operands[2])" [(const_int 0)] { - rtx vcc = gen_rtx_REG (DImode, VCC_REG); emit_insn (gen_add3_vcc_dup (gcn_operand_part (mode, operands[0], 0), gcn_operand_part (DImode, operands[1], 0), gcn_operand_part (mode, operands[2], 0), - vcc)); + operands[3])); emit_insn (gen_addc3 (gcn_operand_part (mode, operands[0], 1), gcn_operand_part (mode, operands[2], 1), - const0_rtx, vcc, vcc)); + const0_rtx, operands[3], operands[3])); DONE; } [(set_attr "type" "vmult") (set_attr "length" "8")]) -(define_insn_and_split "add3_zext_dup_exec" - [(set (match_operand:V_DI 0 "register_operand" "= v, v") +(define_expand "add3_zext_dup" + [(match_operand:V_DI 0 "register_operand") + (match_operand:SI 1 "gcn_alu_operand") + (match_operand:V_DI 2 "gcn_alu_operand")] + "" + { + rtx vcc = gen_rtx_REG (DImode, VCC_REG); + emit_insn (gen_add3_vcc_zext_dup (operands[0], operands[1], + operands[2], vcc)); + DONE; + }) + +(define_insn_and_split "add3_vcc_zext_dup_exec" + [(set (match_operand:V_DI 0 "register_operand" "= v, v") (vec_merge:V_DI (plus:V_DI (zero_extend:V_DI (vec_duplicate: - (match_operand:SI 1 "gcn_alu_operand" "ASv,BSv"))) - (match_operand:V_DI 2 "gcn_alu_operand" "vDb,vDA")) - (match_operand:V_DI 3 "gcn_register_or_unspec_operand" " U0, U0") - (match_operand:DI 4 "gcn_exec_reg_operand" " e, e"))) - (clobber (reg:DI VCC_REG))] + (match_operand:SI 1 "gcn_alu_operand" " ASv, BSv"))) + (match_operand:V_DI 2 "gcn_alu_operand" " vDb, vDA")) + (match_operand:V_DI 4 "gcn_register_or_unspec_operand" " U0, U0") + (match_operand:DI 5 "gcn_exec_reg_operand" " e, e"))) + (set (match_operand:DI 3 "register_operand" "=SgcV,SgcV") + (and:DI + (ltu:DI (plus:V_DI + (zero_extend:V_DI (vec_duplicate: (match_dup 1))) + (match_dup 2)) + (match_dup 1)) + (match_dup 5)))] "" "#" "gcn_can_split_p (mode, operands[0]) && gcn_can_split_p (mode, operands[2]) - && gcn_can_split_p (mode, operands[3])" + && gcn_can_split_p (mode, operands[4])" [(const_int 0)] { - rtx vcc = gen_rtx_REG (DImode, VCC_REG); emit_insn (gen_add3_vcc_dup_exec (gcn_operand_part (mode, operands[0], 0), gcn_operand_part (DImode, operands[1], 0), gcn_operand_part (mode, operands[2], 0), - vcc, - gcn_operand_part (mode, operands[3], 0), - operands[4])); + operands[3], + gcn_operand_part (mode, operands[4], 0), + operands[5])); emit_insn (gen_addc3_exec (gcn_operand_part (mode, operands[0], 1), gcn_operand_part (mode, operands[2], 1), - const0_rtx, vcc, vcc, - gcn_operand_part (mode, operands[3], 1), - operands[4])); + const0_rtx, operands[3], operands[3], + gcn_operand_part (mode, operands[4], 1), + operands[5])); DONE; } [(set_attr "type" "vmult") (set_attr "length" "8")]) -(define_insn_and_split "add3_zext_dup2" - [(set (match_operand:V_DI 0 "register_operand" "= v") +(define_expand "add3_zext_dup_exec" + [(match_operand:V_DI 0 "register_operand") + (match_operand:SI 1 "gcn_alu_operand") + (match_operand:V_DI 2 "gcn_alu_operand") + (match_operand:V_DI 3 "gcn_register_or_unspec_operand") + (match_operand:DI 4 "gcn_exec_reg_operand")] + "" + { + rtx vcc = gen_rtx_REG (DImode, VCC_REG); + emit_insn (gen_add3_vcc_zext_dup_exec (operands[0], operands[1], + operands[2], vcc, operands[3], + operands[4])); + DONE; + }) + +(define_insn_and_split "add3_vcc_zext_dup2" + [(set (match_operand:V_DI 0 "register_operand" "= v") (plus:V_DI (zero_extend:V_DI (match_operand: 1 "gcn_alu_operand" " vA")) - (vec_duplicate:V_DI (match_operand:DI 2 "gcn_alu_operand" "DbSv")))) - (clobber (reg:DI VCC_REG))] + (vec_duplicate:V_DI (match_operand:DI 2 "gcn_alu_operand" " DbSv")))) + (set (match_operand:DI 3 "register_operand" "=SgcV") + (ltu:DI (plus:V_DI + (zero_extend:V_DI (match_dup 1)) + (vec_duplicate:V_DI (match_dup 2))) + (match_dup 1)))] "" "#" "gcn_can_split_p (mode, operands[0])" [(const_int 0)] { - rtx vcc = gen_rtx_REG (DImode, VCC_REG); emit_insn (gen_add3_vcc_dup (gcn_operand_part (mode, operands[0], 0), gcn_operand_part (DImode, operands[2], 0), operands[1], - vcc)); + operands[3])); rtx dsthi = gcn_operand_part (mode, operands[0], 1); emit_insn (gen_vec_duplicate (dsthi, gcn_operand_part (DImode, operands[2], 1))); - emit_insn (gen_addc3 (dsthi, dsthi, const0_rtx, vcc, vcc)); + emit_insn (gen_addc3 (dsthi, dsthi, const0_rtx, operands[3], + operands[3])); DONE; } [(set_attr "type" "vmult") (set_attr "length" "8")]) -(define_insn_and_split "add3_zext_dup2_exec" - [(set (match_operand:V_DI 0 "register_operand" "= v") +(define_expand "add3_zext_dup2" + [(match_operand:V_DI 0 "register_operand") + (match_operand: 1 "gcn_alu_operand") + (match_operand:DI 2 "gcn_alu_operand")] + "" + { + rtx vcc = gen_rtx_REG (DImode, VCC_REG); + emit_insn (gen_add3_vcc_zext_dup2 (operands[0], operands[1], + operands[2], vcc)); + DONE; + }) + +(define_insn_and_split "add3_vcc_zext_dup2_exec" + [(set (match_operand:V_DI 0 "register_operand" "= v") (vec_merge:V_DI (plus:V_DI (zero_extend:V_DI (match_operand: 1 "gcn_alu_operand" "vA")) (vec_duplicate:V_DI (match_operand:DI 2 "gcn_alu_operand" "BSv"))) - (match_operand:V_DI 3 "gcn_register_or_unspec_operand" " U0") - (match_operand:DI 4 "gcn_exec_reg_operand" " e"))) - (clobber (reg:DI VCC_REG))] + (match_operand:V_DI 4 "gcn_register_or_unspec_operand" " U0") + (match_operand:DI 5 "gcn_exec_reg_operand" " e"))) + (set (match_operand:DI 3 "register_operand" "=SgcV") + (and:DI + (ltu:DI (plus:V_DI + (zero_extend:V_DI (match_dup 1)) + (vec_duplicate:V_DI (match_dup 2))) + (match_dup 1)) + (match_dup 5)))] "" "#" "gcn_can_split_p (mode, operands[0]) - && gcn_can_split_p (mode, operands[3])" + && gcn_can_split_p (mode, operands[4])" [(const_int 0)] { - rtx vcc = gen_rtx_REG (DImode, VCC_REG); emit_insn (gen_add3_vcc_dup_exec (gcn_operand_part (mode, operands[0], 0), gcn_operand_part (DImode, operands[2], 0), operands[1], - vcc, - gcn_operand_part (mode, operands[3], 0), - operands[4])); + operands[3], + gcn_operand_part (mode, operands[4], 0), + operands[5])); rtx dsthi = gcn_operand_part (mode, operands[0], 1); emit_insn (gen_vec_duplicate_exec (dsthi, gcn_operand_part (DImode, operands[2], 1), - gcn_operand_part (mode, operands[3], 1), - operands[4])); + gcn_operand_part (mode, operands[4], 1), + operands[5])); emit_insn (gen_addc3_exec - (dsthi, dsthi, const0_rtx, vcc, vcc, - gcn_operand_part (mode, operands[3], 1), - operands[4])); + (dsthi, dsthi, const0_rtx, operands[3], operands[3], + gcn_operand_part (mode, operands[4], 1), + operands[5])); DONE; } [(set_attr "type" "vmult") (set_attr "length" "8")]) +(define_expand "add3_zext_dup2_exec" + [(match_operand:V_DI 0 "register_operand") + (match_operand: 1 "gcn_alu_operand") + (match_operand:DI 2 "gcn_alu_operand") + (match_operand:V_DI 3 "gcn_register_or_unspec_operand") + (match_operand:DI 4 "gcn_exec_reg_operand")] + "" + { + rtx vcc = gen_rtx_REG (DImode, VCC_REG); + emit_insn (gen_add3_vcc_zext_dup2_exec (operands[0], operands[1], + operands[2], vcc, + operands[3], operands[4])); + DONE; + }) + (define_insn_and_split "add3_sext_dup2" [(set (match_operand:V_DI 0 "register_operand" "= v") (plus:V_DI diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 38b5b98c7c8..39eb8fd283f 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -1786,9 +1786,10 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, if (AS_FLAT_P (as)) { + rtx vcc = gen_rtx_REG (DImode, CC_SAVE_REG); + if (REG_P (tmp)) { - rtx vcc = gen_rtx_REG (DImode, CC_SAVE_REG); rtx mem_base_lo = gcn_operand_part (DImode, mem_base, 0); rtx mem_base_hi = gcn_operand_part (DImode, mem_base, 1); rtx tmphi = gcn_operand_part (V64DImode, tmp, 1); @@ -1809,17 +1810,17 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, vcc, vcc, undef_v64si, exec)); } else - emit_insn (gen_addv64di3_zext_dup (tmp, mem_base_lo, tmp)); + emit_insn (gen_addv64di3_vcc_zext_dup (tmp, mem_base_lo, tmp, vcc)); } else { tmp = gen_reg_rtx (V64DImode); if (exec) - emit_insn (gen_addv64di3_zext_dup2_exec (tmp, tmplo, mem_base, - gcn_gen_undef (V64DImode), - exec)); + emit_insn (gen_addv64di3_vcc_zext_dup2_exec + (tmp, tmplo, mem_base, vcc, gcn_gen_undef (V64DImode), + exec)); else - emit_insn (gen_addv64di3_zext_dup2 (tmp, tmplo, mem_base)); + emit_insn (gen_addv64di3_vcc_zext_dup2 (tmp, tmplo, mem_base, vcc)); } new_base = tmp; diff --git a/gcc/testsuite/gcc.target/gcn/gcn.exp b/gcc/testsuite/gcc.target/gcn/gcn.exp new file mode 100644 index 00000000000..0e799e8bc80 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/gcn.exp @@ -0,0 +1,42 @@ +# Specific regression driver for nvptx. +# Copyright (C) 2020 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# GCC testsuite that uses the `dg.exp' driver. + +# Exit immediately if this isn't a nvptx target. +if ![istarget amdgcn*-*-*] then { + return +} + +# Load support procs. +load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_CFLAGS +if ![info exists DEFAULT_CFLAGS] then { + set DEFAULT_CFLAGS " -ansi -pedantic-errors" +} + +# Initialize `dg'. +dg-init + +# Main loop. +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \ + "" $DEFAULT_CFLAGS + +# All done. +dg-finish diff --git a/gcc/testsuite/gcc.target/gcn/vcc-clobber.c b/gcc/testsuite/gcc.target/gcn/vcc-clobber.c new file mode 100644 index 00000000000..e52733cf1e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/vcc-clobber.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +/* Test that gcn_expand_scalar_to_vector_address does not clobber VCC. + If it does then spills and reloads will be unsafe, leading to unexpected + conditional branch behaviour. */ + +extern void abort (); + +__attribute__((vector_size(256))) int vec[2] = {{0}, {0}}; + +int +main() +{ + long vcc = 0; + + /* Load a known value into VCC. The memory barrier ensures that the vector + load must happen after this point. */ + asm volatile ("s_mov_b32 vcc_lo, 0x12345689\n\t" + "s_mov_b32 vcc_hi, 0xabcdef0" + ::: "memory"); + + /* Compiler inserts vector load here. */ + + /* Consume the abitrary vector, and return the current value of VCC. */ + asm volatile ("; no-op" : "=cV"(vcc) : "v"(vec[0]), "v"(vec[1])); + + /* The value should match the initialized value. */ + if (vcc != 0xabcdef012345689) + abort (); + + return 0; +}