From patchwork Tue Aug 18 19:15:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 1347242 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=SC5xrwQu; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BWLGh3KR8z9sRK for ; Wed, 19 Aug 2020 05:15:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 674B6386F804; Tue, 18 Aug 2020 19:15:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 674B6386F804 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1597778138; bh=OF/SkriZX3Cq+rZ/VpQNpneTgUa29cE73dUN0gpwzXU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=SC5xrwQuRWNuFwLzzOG830+1oU+kxYWiqRycBeRBpsgGYqI86khB+az+TMHrg2wLU z5Kpe0p9dGhBreW7xF/pXm/bH2dh8SBL1YEh9BsEiOInnAthE2TjCq4/tG9i/eLh+W BXmQcvuKB/ZO8YO7W0rN+tzQygty2Ncd6QHG9XcM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0DD2B3850433 for ; Tue, 18 Aug 2020 19:15:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0DD2B3850433 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07IJ2dAE015874; Tue, 18 Aug 2020 15:15:35 -0400 Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com with ESMTP id 3304numtyr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 15:15:35 -0400 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07IJAXaR014182; Tue, 18 Aug 2020 19:15:35 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma05wdc.us.ibm.com with ESMTP id 3304tgp6qd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 19:15:35 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 07IJFYmj54329728 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Aug 2020 19:15:34 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8A184B2064; Tue, 18 Aug 2020 19:15:34 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 723B0B2067; Tue, 18 Aug 2020 19:15:34 +0000 (GMT) Received: from marlin.aus.stglabs.ibm.com (unknown [9.40.194.84]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTPS; Tue, 18 Aug 2020 19:15:34 +0000 (GMT) Received: from marlin.aus.stglabs.ibm.com (localhost [127.0.0.1]) by marlin.aus.stglabs.ibm.com (8.15.2/8.15.2/Debian-10) with ESMTP id 07IJFXXb085706; Tue, 18 Aug 2020 14:15:33 -0500 Received: (from sawdey@localhost) by marlin.aus.stglabs.ibm.com (8.15.2/8.15.2/Submit) id 07IJFW6b085651; Tue, 18 Aug 2020 14:15:32 -0500 To: gcc-patches@gcc.gnu.org Subject: [committed] rs6000: unaligned VSX in memcpy/memmove expansion Date: Tue, 18 Aug 2020 14:15:30 -0500 Message-Id: <20200818191530.85137-1-acsawdey@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-08-18_13:2020-08-18, 2020-08-18 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=1 adultscore=0 clxscore=1015 priorityscore=1501 phishscore=0 mlxscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008180131 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Sawdey via Gcc-patches From: Aaron Sawdey Reply-To: Aaron Sawdey Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" I've modified slightly per Will & Segher's comments, re-regstrapped and posting what I've actually committed. Aaron This patch adds a few new instructions to inline expansion of memcpy/memmove. Generation of all these are controlled by the option -mblock-ops-unaligned-vsx which is set on by default if the target has TARGET_EFFICIENT_UNALIGNED_VSX. * unaligned vsx load/store (V2DImode) * unaligned vsx pair load/store (POImode) which is also controlled by -mblock-ops-vector-pair in case it is not wanted at some point. The default for -mblock-ops-vector-pair is for it to be on if the target has TARGET_MMA and TARGET_EFFICIENT_UNALIGNED_VSX. This is redundant, but nice for the future to clearly specify what is required. * unaligned vsx lxvl/stxvl but generally only to do the remainder of a copy/move we stated with some vsx loads/stores, and also prefer to use lb/lh/lw/ld if the remainder is 1/2/4/8 bytes. Testing of this is actually accomplished by gcc.dg/memcmp-1.c which does two memcpy() for each memcmp(). If the memcpy() calls don't do the right thing then the memcmp() will fail unexpectedly. gcc/ChangeLog: * config/rs6000/rs6000-string.c (gen_lxvl_stxvl_move): Helper function. (expand_block_move): Add lxvl/stxvl, vector pair, and unaligned VSX. * config/rs6000/rs6000.c (rs6000_option_override_internal): Default value for -mblock-ops-vector-pair. * config/rs6000/rs6000.opt: Add -mblock-ops-vector-pair. --- gcc/config/rs6000/rs6000-string.c | 103 ++++++++++++++++++++++++++---- gcc/config/rs6000/rs6000.c | 14 +++- gcc/config/rs6000/rs6000.opt | 4 ++ 3 files changed, 105 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000-string.c b/gcc/config/rs6000/rs6000-string.c index c35d93180ca..82cc24ecdda 100644 --- a/gcc/config/rs6000/rs6000-string.c +++ b/gcc/config/rs6000/rs6000-string.c @@ -2708,6 +2708,32 @@ gen_lvx_v4si_move (rtx dest, rtx src) return gen_altivec_lvx_v4si_internal (dest, src); } +static rtx +gen_lxvl_stxvl_move (rtx dest, rtx src, int length) +{ + gcc_assert (MEM_P (dest) ^ MEM_P (src)); + gcc_assert (GET_MODE (dest) == V16QImode && GET_MODE (src) == V16QImode); + gcc_assert (length <= 16); + + bool is_store = MEM_P (dest); + rtx addr; + + /* If the address form is not a simple register, make it so. */ + if (is_store) + addr = XEXP (dest, 0); + else + addr = XEXP (src, 0); + + if (!REG_P (addr)) + addr = force_reg (Pmode, addr); + + rtx len = force_reg (DImode, gen_int_mode (length, DImode)); + if (is_store) + return gen_stxvl (src, addr, len); + else + return gen_lxvl (dest, addr, len); +} + /* Expand a block move operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -2750,18 +2776,56 @@ expand_block_move (rtx operands[], bool might_overlap) if (bytes > rs6000_block_move_inline_limit) return 0; + int orig_bytes = bytes; for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes) { union { - rtx (*movmemsi) (rtx, rtx, rtx, rtx); rtx (*mov) (rtx, rtx); + rtx (*movlen) (rtx, rtx, int); } gen_func; machine_mode mode = BLKmode; rtx src, dest; - - /* Altivec first, since it will be faster than a string move - when it applies, and usually not significantly larger. */ - if (TARGET_ALTIVEC && bytes >= 16 && align >= 128) + bool move_with_length = false; + + /* Use POImode for paired vsx load/store. Use V2DI for single + unaligned vsx load/store, for consistency with what other + expansions (compare) already do, and so we can use lxvd2x on + p8. Order is VSX pair unaligned, VSX unaligned, Altivec, VSX + with length < 16 (if allowed), then gpr load/store. */ + + if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX + && TARGET_BLOCK_OPS_VECTOR_PAIR + && bytes >= 32 + && (align >= 256 || !STRICT_ALIGNMENT)) + { + move_bytes = 32; + mode = POImode; + gen_func.mov = gen_movpoi; + } + else if (TARGET_POWERPC64 && TARGET_BLOCK_OPS_UNALIGNED_VSX + && VECTOR_MEM_VSX_P (V2DImode) + && bytes >= 16 && (align >= 128 || !STRICT_ALIGNMENT)) + { + move_bytes = 16; + mode = V2DImode; + gen_func.mov = gen_vsx_movv2di_64bit; + } + else if (TARGET_BLOCK_OPS_UNALIGNED_VSX + && TARGET_POWER10 && bytes < 16 + && orig_bytes > 16 + && !(bytes == 1 || bytes == 2 + || bytes == 4 || bytes == 8) + && (align >= 128 || !STRICT_ALIGNMENT)) + { + /* Only use lxvl/stxvl if it could replace multiple ordinary + loads+stores. Also don't use it unless we likely already + did one vsx copy so we aren't mixing gpr and vsx. */ + move_bytes = bytes; + mode = V16QImode; + gen_func.movlen = gen_lxvl_stxvl_move; + move_with_length = true; + } + else if (TARGET_ALTIVEC && bytes >= 16 && align >= 128) { move_bytes = 16; mode = V4SImode; @@ -2818,7 +2882,16 @@ expand_block_move (rtx operands[], bool might_overlap) gen_func.mov = gen_movqi; } - /* Mode is always set to something other than BLKmode by one of the + /* If we can't succeed in doing the move in one pass, we can't + do it in the might_overlap case. Bail out and return + failure. We test num_reg + 1 >= MAX_MOVE_REG here to check + the same condition as the test of num_reg >= MAX_MOVE_REG + that is done below after the increment of num_reg. */ + if (might_overlap && num_reg + 1 >= MAX_MOVE_REG + && bytes > move_bytes) + return 0; + + /* Mode is always set to something other than BLKmode by one of the cases of the if statement above. */ gcc_assert (mode != BLKmode); @@ -2826,15 +2899,17 @@ expand_block_move (rtx operands[], bool might_overlap) dest = adjust_address (orig_dest, mode, offset); rtx tmp_reg = gen_reg_rtx (mode); - - loads[num_reg] = (*gen_func.mov) (tmp_reg, src); - stores[num_reg++] = (*gen_func.mov) (dest, tmp_reg); - /* If we didn't succeed in doing it in one pass, we can't do it in the - might_overlap case. Bail out and return failure. */ - if (might_overlap && num_reg >= MAX_MOVE_REG - && bytes > move_bytes) - return 0; + if (move_with_length) + { + loads[num_reg] = (*gen_func.movlen) (tmp_reg, src, move_bytes); + stores[num_reg++] = (*gen_func.movlen) (dest, tmp_reg, move_bytes); + } + else + { + loads[num_reg] = (*gen_func.mov) (tmp_reg, src); + stores[num_reg++] = (*gen_func.mov) (dest, tmp_reg); + } /* Emit loads and stores saved up. */ if (num_reg >= MAX_MOVE_REG || bytes == move_bytes) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index fe93cf6ff2b..1c1caa90ede 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4018,6 +4018,14 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_UNALIGNED_VSX; } + if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) + { + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; + else + rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; + } + /* Use long double size to select the appropriate long double. We use TYPE_PRECISION to differentiate the 3 different long double types. We map 128 into the precision used for TFmode. */ @@ -23222,8 +23230,10 @@ struct rs6000_opt_mask { static struct rs6000_opt_mask const rs6000_opt_masks[] = { { "altivec", OPTION_MASK_ALTIVEC, false, true }, - { "block-ops-unaligned-vsx", OPTION_MASK_BLOCK_OPS_UNALIGNED_VSX, - false, true }, + { "block-ops-unaligned-vsx", OPTION_MASK_BLOCK_OPS_UNALIGNED_VSX, + false, true }, + { "block-ops-vector-pair", OPTION_MASK_BLOCK_OPS_VECTOR_PAIR, + false, true }, { "cmpb", OPTION_MASK_CMPB, false, true }, { "crypto", OPTION_MASK_CRYPTO, false, true }, { "direct-move", OPTION_MASK_DIRECT_MOVE, false, true }, diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d3e740e930..b2a70e88ca8 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -328,6 +328,10 @@ mblock-ops-unaligned-vsx Target Report Mask(BLOCK_OPS_UNALIGNED_VSX) Var(rs6000_isa_flags) Generate unaligned VSX load/store for inline expansion of memcpy/memmove. +mblock-ops-vector-pair +Target Undocumented Mask(BLOCK_OPS_VECTOR_PAIR) Var(rs6000_isa_flags) +Generate unaligned VSX vector pair load/store for inline expansion of memcpy/memmove. + mblock-compare-inline-limit= Target Report Var(rs6000_block_compare_inline_limit) Init(63) RejectNegative Joined UInteger Save Max number of bytes to compare without loops.