From patchwork Fri Feb 9 07:32:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 871239 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472924-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="n/xyeU+X"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zd6J42Z74z9s7v for ; Fri, 9 Feb 2018 18:33:54 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=ZJvDkP9dj+A1 nN52AV0KVVJzb9cLqgZjTg6969J8S53XtT12KeE8I40EkkZxXV8zqZTOuBKgEC3C /C8L/7ZKBeOwvpI32rI5ikjkywYgCRmmbzIrf16akf2sFJBY/enYsb58TFElczGz 3s5FO4iHgUMsRHMJd7EvJj9/dYgmcso= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=68CHbEKvtl0rMVaV57 8p3F/Itcc=; b=n/xyeU+XetFSDuqBgbYohWvGeF5Ee+qEkH/fRI/QmbF11cJoR3 ejF8IAUQccHwdCdEiWWq7i5rcFo83JU6ZNwo5xzO7S97AgBAZ7EzpnTdK5IFK6Q7 zIasVWIxNr7ovP96BcXIzdE05YhieOE3vSBMrnHfmN0S9u6SwRKbxJboU= Received: (qmail 122199 invoked by alias); 9 Feb 2018 07:33:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 122189 invoked by uid 89); 9 Feb 2018 07:33:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=ry, rw, yy, mw X-HELO: homiemail-a52.g.dreamhost.com Received: from sub5.mail.dreamhost.com (HELO homiemail-a52.g.dreamhost.com) (208.113.200.129) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Feb 2018 07:33:44 +0000 Received: from homiemail-a52.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTP id 7A22D6003B10; Thu, 8 Feb 2018 23:33:42 -0800 (PST) Received: from devel.in.reserved-bit.com (unknown [202.189.238.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTPSA id AA4886003B0F; Thu, 8 Feb 2018 23:33:40 -0800 (PST) From: Siddhesh Poyarekar To: gcc-patches@gcc.gnu.org Cc: wilson@tuliptree.org, kugan.vivekanandarajah@linaro.org, james.greenhalgh@arm.com, richard.earnshaw@arm.com Subject: [PATCH v3] Disable reg offset in quad-word store for Falkor Date: Fri, 9 Feb 2018 13:02:00 +0530 Message-Id: <20180209073200.29318-1-siddhesh@sourceware.org> Hi, Here's v3 of the patch to disable register offset addressing mode for stores of 128-bit values on Falkor because they're very costly. Following Kyrill's suggestion, I compared the codegen for a53 and found that the codegen was quite different. Jim's original patch is the most minimal compromise for this and is also a cleaner temporary fix before I attempt to split address costs into loads and stores for gcc9. So v3 is essentially a very slightly cleaned up version of v1 again, this time with confirmation that there are no codegen changes in CPU2017 on non-falkor builds; only the codegen for -mcpu=falkor is different. ---- On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. This is the most minimal change for gcc8, I will volunteer to make a more lasting change for gcc9 where I split the addressing mode costs into loads and stores wherever possible and needed. This patch improves fpspeed by 0.17% and intspeed by 0.62% in CPU2017, with xalancbmk_s (3.84%) wrf_s (1.46%) and mcf_s (1.62%) being the biggest winners. There were no regressions beyond 0.4%. 2018-xx-xx Jim Wilson Kugan Vivenakandarajah Siddhesh Poyarekar gcc/ * config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf. * config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64.md (movti_aarch64): Use Utf. (movtf_aarch64): Likewise. * config/aarch64/constraints.md (Utf): New. gcc/testsuite * gcc.target/aarch64/pr82533.c: New test case. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 ++++ gcc/config/aarch64/aarch64.c | 14 +++++++++++++- gcc/config/aarch64/aarch64.md | 8 ++++---- gcc/config/aarch64/constraints.md | 6 ++++++ gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++++++++++ 7 files changed, 41 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index cda2895d28e..5a0323deb1e 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -433,6 +433,7 @@ bool aarch64_simd_mem_operand_p (rtx); bool aarch64_sve_ld1r_operand_p (rtx); bool aarch64_sve_ldr_operand_p (rtx); bool aarch64_sve_struct_memory_operand_p (rtx); +bool aarch64_movti_target_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3d1f6a01cb7..f7daac3e28d 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -131,9 +131,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Utf, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index ea9ead234cb..04baf5b6de6 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */ +AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store", + SLOW_REGOFFSET_QUADWORD_STORE) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 228fd1b908d..c0a05598415 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -875,7 +875,7 @@ static const struct tune_params qdf24xx_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE), /* tune_flags. */ &qdf24xx_prefetch_tune }; @@ -13634,6 +13634,18 @@ aarch64_sve_struct_memory_operand_p (rtx op) && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last)); } +/* Return TRUE if OP uses an efficient memory address for quad-word target. */ +bool +aarch64_movti_target_operand_p (rtx op) +{ + if (! optimize_size + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)) + return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS + && ! CONST_INT_P (XEXP (XEXP (op, 0), 1))); + return MEM_P (op); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5a2a9309a3b..1e6edcf51f2 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1080,9 +1080,9 @@ (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "= r,w, r,w,r,m,m,w,m") + "nonimmediate_operand" "= r,w, r,w,r,m,m,w,Utf") (match_operand:TI 1 - "aarch64_movti_operand" " rUti,r, w,w,m,r,Z,m,w"))] + "aarch64_movti_operand" " rUti,r, w,w,m,r,Z,m, w"))] "(register_operand (operands[0], TImode) || aarch64_reg_or_zero (operands[1], TImode))" "@ @@ -1227,9 +1227,9 @@ (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 - "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,Utf,?r,m ,m") (match_operand:TF 1 - "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,?r, ?r,w ,Y,Y ,m,w ,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], TFmode) || aarch64_reg_or_fp_zero (operands[1], TFmode))" "@ diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index f052103e859..35aa62996ae 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -235,6 +235,12 @@ (and (match_code "mem") (match_test "aarch64_sve_ldr_operand_p (op)"))) +(define_memory_constraint "Utf" + "@internal + An efficient memory address for a quad-word target operand." + (and (match_code "mem") + (match_test "aarch64_movti_target_operand_p (op)"))) + (define_memory_constraint "Utv" "@internal An address valid for loading/storing opaque structure diff --git a/gcc/testsuite/gcc.target/aarch64/pr82533.c b/gcc/testsuite/gcc.target/aarch64/pr82533.c new file mode 100644 index 00000000000..fa28ffac03a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr82533.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=falkor -O2 -ftree-vectorize" } */ + +void +copy (int N, double *c, double *a) +{ + for (int i = 0; i < N; ++i) + c[i] = a[i]; +} + +/* { dg-final { scan-assembler-not "str\tq\[0-9\]+, \\\[x\[0-9\]+, x\[0-9\]+\\\]" } } */