From patchwork Wed Jun 7 11:38:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 772370 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wjRQP4Ng3z9s78 for ; Wed, 7 Jun 2017 21:38:36 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="YNcM4U5u"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=wSa3YorKn8Bt++BdoEy/LILuk0jjS9u6Vp9sC67T/EOMH2Sm2k cP6UOD7vj8RJOJWYMN5HhaL6ggptLoV/WXMooHM2hG7VEKUX4oudrOP1aCFbSWcy 12IfUZtemymHavqEm5DDFYo1wVwm0Z9nLBwqUCZjYncN+JS71wOqTe0YA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=PdkWZTZNBs5VD3BDMurEiqLY3zo=; b=YNcM4U5uVM415KeWKCGi ZGCaQEAtVb+fuJW1iFw6hYje0+5SsbiYtHMlhEKeqS9NcsSH75U6Pr2VLm4ySKl3 tc/Sx/cSRLDLGfd0uSdZEXFGG4Akf8h+lPVopomgsSI9wRjBA7v93+CdQlwfWOqR nUXC78VV9aufZmlnMNPVbdM= Received: (qmail 45640 invoked by alias); 7 Jun 2017 11:38:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 45622 invoked by uid 89); 7 Jun 2017 11:38:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=ldrh, thinks X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com Received: from mail-db5eur01on0065.outbound.protection.outlook.com (HELO EUR01-DB5-obe.outbound.protection.outlook.com) (104.47.2.65) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 07 Jun 2017 11:38:17 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com (10.173.74.140) by HE1PR0802MB2394.eurprd08.prod.outlook.com (10.175.33.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1143.10; Wed, 7 Jun 2017 11:38:18 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399]) by VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399%17]) with mapi id 15.01.1143.018; Wed, 7 Jun 2017 11:38:17 +0000 From: Tamar Christina To: GCC Patches CC: nd , James Greenhalgh , "Richard Earnshaw" , Marcus Shawcroft Subject: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access Date: Wed, 7 Jun 2017 11:38:17 +0000 Message-ID: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; HE1PR0802MB2394; 7:SUM3QtXCWmj7nhGI+1G1KxjNvbsdek4kTGLs8tcguchY+ibKQjwfMsSRhsxJ7YjnzqHM+ttkvqGIjCZR+LwfjKd24tHIDHqeFp9wj3GjQySasitniX5UKfg4PqTcbdKgysxxO3jqSuLJJWH4dcruvTwKQiF5TtCectn2QF2Lcz4cbnsCAURBxXK82XiUfP0SINWWB3QvZf2U4AJ8rkhL7zWFxRIl5qUE6ObJ8lLLFRGv5TQoGnyq/Nd+EXJZPniJ6rQBUILueRhJ7QOUDxw2xHeSkE1I/KqhmQxHGfsBPm1OQD9WIKBqRMYzM9dexIngzimB53GmVY9+J9kfhsMwUw== x-ms-traffictypediagnostic: HE1PR0802MB2394: x-ms-office365-filtering-correlation-id: a942bcc5-7795-487a-3c77-08d4ad99b0be x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081); SRVR:HE1PR0802MB2394; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(100000703101)(100105400095)(10201501046)(3002001)(6055026)(6041248)(201703131423075)(201703011903075)(201702281528075)(201703061421075)(20161123560025)(20161123564025)(20161123558100)(20161123562025)(20161123555025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:HE1PR0802MB2394; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:HE1PR0802MB2394; x-forefront-prvs: 03319F6FEF x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39400400002)(39850400002)(39860400002)(39840400002)(39450400003)(39410400002)(53754006)(377424004)(99286003)(9686003)(55016002)(14454004)(4326008)(6116002)(54356999)(7736002)(102836003)(7696004)(99936001)(8936002)(25786009)(305945005)(50986999)(74316002)(3846002)(53936002)(6916009)(6436002)(54906002)(3660700001)(3280700002)(6506006)(508600001)(66066001)(2906002)(86362001)(189998001)(5250100002)(2900100001)(110136004)(38730400002)(8676002)(72206003)(81166006)(33656002)(5660300001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0802MB2394; H:VI1PR0801MB2031.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Jun 2017 11:38:17.1378 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2394 X-IsSubscribed: yes Hi All, This patch allows larger bitsizes to be used as copy size when the target does not have SLOW_UNALIGNED_ACCESS. It also provides an optimized routine for MEM to REG copying which avoid reconstructing the value piecewise on the stack and instead uses a combination of shifts and ORs. This now generates adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 sub sp, sp, #16 ldr w1, [x0, 120] str w1, [sp, 8] ldr x0, [x0, 112] ldr x1, [sp, 8] add sp, sp, 16 instead of: adrp x3, .LANCHOR0 add x3, x3, :lo12:.LANCHOR0 mov x0, 0 mov x1, 0 sub sp, sp, #16 ldr x2, [x3, 112] ldr w3, [x3, 120] add sp, sp, 16 ubfx x5, x2, 8, 8 bfi x0, x2, 0, 8 ubfx x4, x2, 16, 8 lsr w9, w2, 24 bfi x0, x5, 8, 8 ubfx x7, x2, 32, 8 ubfx x5, x2, 40, 8 ubfx x8, x3, 8, 8 bfi x0, x4, 16, 8 bfi x1, x3, 0, 8 ubfx x4, x2, 48, 8 ubfx x6, x3, 16, 8 bfi x0, x9, 24, 8 bfi x1, x8, 8, 8 lsr x2, x2, 56 lsr w3, w3, 24 bfi x0, x7, 32, 8 bfi x1, x6, 16, 8 bfi x0, x5, 40, 8 bfi x1, x3, 24, 8 bfi x0, x4, 48, 8 bfi x0, x2, 56, 8 To load a 12 1-byte element struct. and adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 sub sp, sp, #16 ldrb w1, [x0, 18] ldrh w0, [x0, 16] orr w0, w0, w1, lsr 16 str w0, [sp, 8] add sp, sp, 16 instead of adrp x2, .LANCHOR0 add x2, x2, :lo12:.LANCHOR0 mov x0, 0 sub sp, sp, #16 ldrh w1, [x2, 16] ldrb w2, [x2, 18] add sp, sp, 16 bfi x0, x1, 0, 8 ubfx x1, x1, 8, 8 bfi x0, x1, 8, 8 bfi x0, x2, 16, 8 These changes only have an effect on structures smaller than 16 bytes. The remaining stores come from an existing incomplete data-flow analysis which thinks the value on the stack is being used and doesn't mark the value as dead. Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no regressions. OK for trunk? Thanks, Tamar gcc/ 2017-06-07 Tamar Christina * expr.c (copy_blkmode_to_reg): Fix bitsize for targets with fast unaligned access. * config/aarch64/aarch64.c (aarch64_expand_movmem): Add MEM to REG optimized case. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4f769a40a4e9de83cb5aacfd3ff58301c2feeb78..8906d9a9445ed36f43302708d1f6212bcf017bdc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13498,6 +13498,41 @@ aarch64_expand_movmem (rtx *operands) base = copy_to_mode_reg (Pmode, XEXP (src, 0)); src = adjust_automodify_address (src, VOIDmode, base, 0); + /* Optimize routines for MEM to REG copies. */ + if (n < 8 && !REG_P (XEXP (operands[0], 0))) + { + unsigned int max_align = UINTVAL (operands[2]); + max_align = n < max_align ? max_align : n; + machine_mode mov_mode, dest_mode + = smallest_mode_for_size (max_align * BITS_PER_UNIT, MODE_INT); + rtx result = gen_reg_rtx (dest_mode); + emit_insn (gen_move_insn (result, GEN_INT (0))); + + unsigned int shift_cnt = 0; + for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode)) + { + int nearest = 0; + /* Find the mode to use, but limit the max to TI mode. */ + for (unsigned max = 1; max <= (n - shift_cnt) && max <= 16; max *= 2) + nearest = max; + + mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, MODE_INT); + rtx reg = gen_reg_rtx (mov_mode); + + src = adjust_address (src, mov_mode, 0); + emit_insn (gen_move_insn (reg, src)); + src = aarch64_progress_pointer (src); + + reg = gen_rtx_ASHIFT (dest_mode, reg, + GEN_INT (shift_cnt * BITS_PER_UNIT)); + result = gen_rtx_IOR (dest_mode, reg, result); + } + + dst = adjust_address (dst, dest_mode, 0); + emit_insn (gen_move_insn (dst, result)); + return true; + } + /* Simple cases. Copy 0-3 bytes, as (if applicable) a 2-byte, then a 1-byte chunk. */ if (n < 4) diff --git a/gcc/expr.c b/gcc/expr.c index 91d7ea217229fac62380b5d4b646961bf7c836c1..b1df4651e7942346007cda1cce8ee5a19297ab16 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -2743,7 +2743,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src) n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD; dst_words = XALLOCAVEC (rtx, n_regs); - bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD); + bitsize = BITS_PER_WORD; + if (SLOW_UNALIGNED_ACCESS (BLKmode, TYPE_ALIGN (TREE_TYPE (src)))) + bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD); /* Copy the structure BITSIZE bits at a time. */ for (bitpos = 0, xbitpos = padding_correction;