From patchwork Thu Nov 5 15:58:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1395080 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Bag/UQ+U; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CRpBC0Ttvz9sSs for ; Fri, 6 Nov 2020 02:59:45 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B0F63870880; Thu, 5 Nov 2020 15:59:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0B0F63870880 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1604591983; bh=NaEniRGtdzoWUJDV7Vh59k4mEb2x1QnkSd2+EvF11xI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Bag/UQ+UhKqb2qMerJ6Lf3QZtKChLX0iE3XGYpBAZ0vMKtw8NGqECDB+3WqbUm1Qe S9Zvhbj/sFf91UIs90KEnn0byRNr6fww57I9QoiqFB58/F1HrQaoBR+9a2RhM6KIS4 hYcT3RktZ/PocH85XxY2deVXp9jAVSY7GqJjc5BI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2046.outbound.protection.outlook.com [40.107.20.46]) by sourceware.org (Postfix) with ESMTPS id A0C183846047 for ; Thu, 5 Nov 2020 15:59:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A0C183846047 Received: from AM6P192CA0094.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::35) by VI1PR08MB4512.eurprd08.prod.outlook.com (2603:10a6:803:f4::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.27; Thu, 5 Nov 2020 15:59:35 +0000 Received: from AM5EUR03FT015.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8d:cafe::c9) by AM6P192CA0094.outlook.office365.com (2603:10a6:209:8d::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3541.21 via Frontend Transport; Thu, 5 Nov 2020 15:59:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT015.mail.protection.outlook.com (10.152.16.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3541.17 via Frontend Transport; Thu, 5 Nov 2020 15:59:35 +0000 Received: ("Tessian outbound 0cd77479b325:v64"); Thu, 05 Nov 2020 15:59:34 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: b709736bc8afd600 X-CR-MTA-TID: 64aa7808 Received: from c83a4b853d09.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id A0F3525E-B8C1-493D-8C61-9D939B32F902.1; Thu, 05 Nov 2020 15:58:51 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id c83a4b853d09.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 05 Nov 2020 15:58:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hNEp6oT1wotqqW5mlX3EgknimeZ9wSrgbLrGYiHjPmDgEEgMGrfpajc6N47G38iwvB/OkujwMOcD8EqzZ+3wgGz04L5VTzrhJNzzcUjkj57nNHLVWYMA1uTXwtJFRO5emkfisEaxrPQ5ttCP8+IMoYAqnWLifcd5xHCLpfVBq5SrOKMdnlohPtJb/ogyU6NtD/BV2rWF44GMe6p3+R+icQxO/AqXMv382dIJn4ilKiDH5ai0yIK3azjpHZaMRytI4lZOabYh+SgODuRlyd8kjelGuBoEcT2lLuq7aPUwmJTV8nDb06Bh1LIECR1CYuZP2lPbluYb6shli2QUgx2bYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NaEniRGtdzoWUJDV7Vh59k4mEb2x1QnkSd2+EvF11xI=; b=DZlSffX06w6wjN0DPdtudbdufeQ/fyrRp08ikhbrpcyk6Axwu+OmZsYVa8W6+KLJyqFTw0nUdT+sLwkaiSEPCegcYSL6ILnZgSzdseRGUkRD+49khs9lYZpPu70GaNyd0Ag0ktgmTx8GNalxGgH44mSkjEFM6cXBI8xHzcJAjS+PNTpP9wU4zxv4ibL3uZ17P0EYDWJQMgeWRH4t45QwzO+Zcg2nrVVqRoxqX0iisTvWqSIBy01FblRiGr6TLZQjQ1v4kbMjDn/fh7/62JG39GV9H9iYJ9nOtgBeEmht21BVRniQWGYHAISibBQmkzkTe1e/93X2iwLwGMS4UjF32g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0802MB2399.eurprd08.prod.outlook.com (2603:10a6:800:bb::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.28; Thu, 5 Nov 2020 15:58:49 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::1ec:1724:137f:219]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::1ec:1724:137f:219%4]) with mapi id 15.20.3499.032; Thu, 5 Nov 2020 15:58:49 +0000 To: GCC Patches Subject: [PATCH] AArch64: Improve inline memcpy expansion Thread-Topic: [PATCH] AArch64: Improve inline memcpy expansion Thread-Index: AQHWs4weMFVPkyjvakqE8BjiUQ4TfA== Date: Thu, 5 Nov 2020 15:58:47 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.199.97] x-ms-publictraffictype: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 9e8f49e3-1a6b-4722-386f-08d881a3ca98 x-ms-traffictypediagnostic: VI1PR0802MB2399:|VI1PR08MB4512: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:1122;OLM:1122; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: yZlHuOs+42BgUTxEQ2oLeifbtrLHGN2LjSJHnE56PgyQ8rHAF8HjkygmTLB/P0TaYAL0kbDFYPRnqLzjUNqIueFrG8lMmM6kkXujM4futUogEqH0u/h1UQbg81YR+di7AR9fNFxiKdRkWE2WJ+jnb6Gu9A+CQwTtfXFHfPWVBNPK8fuKzn07PE01CsP6ldygcLUqiV4cjGz6SC1ZvD19o2VcVKVwuS9ee+G1paR4sgmX9UB+mxXpmDxg9MlCFFNwgmKQTboZcSwEUiKhd1BwFxmf0UMQlN+E4cjGymdyUpoMRwU/cgJzAnRepHavgJt2/CBaO0DGo/xweRHzT4t6SQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(396003)(346002)(366004)(39860400002)(33656002)(71200400001)(7696005)(8676002)(26005)(6506007)(186003)(8936002)(66446008)(478600001)(83380400001)(86362001)(2906002)(4326008)(9686003)(54906003)(5660300002)(52536014)(6916009)(76116006)(55016002)(64756008)(66556008)(316002)(66946007)(66476007); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: B+s61BnVTiwooELvHmVRDdK+X/RQTu/Y0l5o7Gdv/B0M8PIO3QwN5IHrGr5pjeP/6Fezd+3FhXx1rfJjZYTaL8iVvqCtI4AytXAc0S7YRv3rB+Si5/UMmwbsvIpYo301Ff7p4S2yGKgIfQ1DFRv1YadQIwH7o6CoMq5gIvs5SgTNXgBz3QrPgvRwMephgi4npJffyWl5feGnW5VPvUzCscYfmhL4cPXNdwsk6FOfuGwslxt6OYLKJonfEHx0s+0ZznCgWzbU3lSUTgk/MaGJY5IU3+iU+7+Np2KLu/mMIhHIAzWJJk17N0llKEdTWV3Q2QtvXn9b8zJjvNr5o6TPnmgoyAMcQYw9MHtvAyyQPtkWlB+4hwWBYZY+enU4/u5ECklXN1xKLRQU29jhNawPaup7B6fn9JYnbia/HHIfJWnryo1qy5aIhu00mnyGIOOIt/VSC04YtoSYbtsFfoYFiK+hBRlQiDxO1ZjjIF8EY1pXkMX8v27CgLXABa55iAUn0IoBlzvgL6zZHdBtpXhchC0F/rk0cDbqyWRHmhHGPfng/qEXxJUEdWnu/cdIogG6KlGM2xk9H9b6vIQ06JKfqEmm3xqKTZ0I8dCvqimohALda/TGM5jSM83Q2C/PXqN/wyrQeJXihBb3zECTLBSnYg== MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2399 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 5edc67af-e877-43d6-2241-08d881a3af1a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hbeBezuGPWy5c6TURQ+1V7SBFqdANBxPk2zFRGruY4XZwTmKAz9wO4YH9YELJToP62R4/Ksz6azBCR3/PoboM8k9Sa4mLrub0X4ZeIs1U9P9Z63x+65U6C9oI0+9uaSdDm6Yj8UA66w4aX+5vzB8P2RUImsM77Dr3hhuKROzQ+Ax7Z70STqg3LR6+r+28oyhyFhAaDVrz2Yq2c8TVbvRYICJ+eQyEC5j4j+Elto3yfhhHUSXivejgxv3hXFEbbJQ8G2C808cJH1q5kK7qrd83RzxJ9ljG4LsR11Xw+OnhKHX6NDD/nSOIuWWW8Cz0hAKY3jbXsvvzz0husbpcmUOWzgz9BdK8jFj7h1nSG+M3DlMRxsKcwjt0WE9zlqgn3HtjaTh/e9KY0K5XhpyZicRhQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(39860400002)(136003)(376002)(396003)(346002)(46966005)(316002)(54906003)(26005)(6506007)(9686003)(55016002)(36906005)(81166007)(186003)(8676002)(82310400003)(33656002)(336012)(7696005)(83380400001)(4326008)(82740400003)(8936002)(2906002)(47076004)(5660300002)(70586007)(70206006)(356005)(86362001)(6916009)(52536014)(478600001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Nov 2020 15:59:35.0971 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9e8f49e3-1a6b-4722-386f-08d881a3ca98 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4512 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Gcc-patches From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: Richard Sandiford Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Improve the inline memcpy expansion. Use integer load/store for copies <= 24 bytes instead of SIMD. Set the maximum copy to expand to 256 by default, except that -Os or no Neon expands up to 128 bytes. When using LDP/STP of Q-registers, also use Q-register accesses for the unaligned tail, saving 2 instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions). Cleanup code and comments. The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017. Passes bootstrap and regress. OK for commit? ChangeLog: 2020-11-03 Wilco Dijkstra * config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and comments, tweak expansion decisions and improve tail expansion. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 41e2a699108146e0fa7464743607bd34e91ea9eb..9487c1cb07b0d851c0f085262179470d0d596116 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21255,35 +21255,36 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, bool aarch64_expand_cpymem (rtx *operands) { - /* These need to be signed as we need to perform arithmetic on n as - signed operations. */ - int n, mode_bits; + int mode_bits; rtx dst = operands[0]; rtx src = operands[1]; rtx base; - machine_mode cur_mode = BLKmode, next_mode; - bool speed_p = !optimize_function_for_size_p (cfun); + machine_mode cur_mode = BLKmode; - /* When optimizing for size, give a better estimate of the length of a - memcpy call, but use the default otherwise. Moves larger than 8 bytes - will always require an even number of instructions to do now. And each - operation requires both a load+store, so divide the max number by 2. */ - unsigned int max_num_moves = (speed_p ? 16 : AARCH64_CALL_RATIO) / 2; - - /* We can't do anything smart if the amount to copy is not constant. */ + /* Only expand fixed-size copies. */ if (!CONST_INT_P (operands[2])) return false; - unsigned HOST_WIDE_INT tmp = INTVAL (operands[2]); + unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - /* Try to keep the number of instructions low. For all cases we will do at - most two moves for the residual amount, since we'll always overlap the - remainder. */ - if (((tmp / 16) + (tmp % 16 ? 2 : 0)) > max_num_moves) + /* Inline up to 256 bytes when optimizing for speed. */ + unsigned HOST_WIDE_INT max_copy_size = 256; + + if (optimize_function_for_size_p (cfun) || !TARGET_SIMD) + max_copy_size = 128; + + if (size > max_copy_size) return false; - /* At this point tmp is known to have to fit inside an int. */ - n = tmp; + int copy_bits = 256; + + /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD + support or slow 256-bit LDP/STP fall back to 128-bit chunks. */ + if (size <= 24 || !TARGET_SIMD + || (size <= (max_copy_size / 2) + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))) + copy_bits = GET_MODE_BITSIZE (TImode); base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -21291,15 +21292,8 @@ aarch64_expand_cpymem (rtx *operands) base = copy_to_mode_reg (Pmode, XEXP (src, 0)); src = adjust_automodify_address (src, VOIDmode, base, 0); - /* Convert n to bits to make the rest of the code simpler. */ - n = n * BITS_PER_UNIT; - - /* Maximum amount to copy in one go. We allow 256-bit chunks based on the - AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and TARGET_SIMD. */ - const int copy_limit = ((aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) - || !TARGET_SIMD) - ? GET_MODE_BITSIZE (TImode) : 256; + /* Convert size to bits to make the rest of the code simpler. */ + int n = size * BITS_PER_UNIT; while (n > 0) { @@ -21307,23 +21301,26 @@ aarch64_expand_cpymem (rtx *operands) or writing. */ opt_scalar_int_mode mode_iter; FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) - if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_bits)) cur_mode = mode_iter.require (); gcc_assert (cur_mode != BLKmode); mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); + + /* Prefer Q-register accesses for the last bytes. */ + if (mode_bits == 128 && copy_bits == 256) + cur_mode = V4SImode; + aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); n -= mode_bits; - /* Do certain trailing copies as overlapping if it's going to be - cheaper. i.e. less instructions to do so. For instance doing a 15 - byte copy it's more efficient to do two overlapping 8 byte copies than - 8 + 6 + 1. */ - if (n > 0 && n <= 8 * BITS_PER_UNIT) + /* Emit trailing copies using overlapping unaligned accesses - this is + smaller and faster. */ + if (n > 0 && n < copy_bits / 2) { - next_mode = smallest_mode_for_size (n, MODE_INT); + machine_mode next_mode = smallest_mode_for_size (n, MODE_INT); int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); src = aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT); dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);