From patchwork Fri Sep 22 15:49:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jim Wilson X-Patchwork-Id: 817600 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462785-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="jobfBCw+"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xzHx22JWTz9s83 for ; Sat, 23 Sep 2017 01:49:56 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=TSjctfO/hfG9 vTTFbWc/Vw5oi7w7guzFH5veGYhb4NBSDEH6YofTQOpkdYv09rnWW71nJ3/bgia0 dSagAP9qjSmw2LqC+crkQYz/tqW/WpYdR6nPfF8Q5jMZyKSnbUnNd4oWLuG80Vh6 6FoiDiq8+G98CtATGJm0ChgI3OPnW+k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=qoUCfy8ORkxtRoP8Uf 6xxPzH/eQ=; b=jobfBCw+Pqw7buxK2OiGlLQaMKnv8Py1YTTja/F91naHm/5CcA bMvHhs+/7T4HGzg2Q36+QzhTKQAR4KGOYPuAykgIEdBFsYTTwDkhdQ/FtvQbMd+R EewxTOp81WazVwKnaTIfFJtDflFyfDXySg6R7FhxXWMzPOhmnlA4lPEvw= Received: (qmail 47237 invoked by alias); 22 Sep 2017 15:49:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 47146 invoked by uid 89); 22 Sep 2017 15:49:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=ry, yy, mw, 0.4 X-HELO: mail-io0-f182.google.com Received: from mail-io0-f182.google.com (HELO mail-io0-f182.google.com) (209.85.223.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 22 Sep 2017 15:49:23 +0000 Received: by mail-io0-f182.google.com with SMTP id d16so4017930ioj.3 for ; Fri, 22 Sep 2017 08:49:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=phdu39ztNbgQYtuC5j5UVN+rD96TgFJtgZ+sih9ZJ/k=; b=T8COlrWP9B8u/IN4C4YhW7OcFBlv0ARNVnUcJN8wYQ2kHaRzPL3eDAUbD/lmaXqNKP 9EOLVmkaKtCBoCOKwsrsOILZWAFro/gueks2iN7/ZbgeQ2z3j667MmoJhl1z8nJG4+LV F5w93AyA4MoeV4M2g2gGVoP41oZho22/DjmVdzocjQ2v6UdQtVWmIWfqaa5v50p7BaDJ c4ACZusf9V1yXWEx/o6W0OZSwwBfgGNCeJZs5r6cusuo0sb5gF9hVwv3dATVeYGXBnPa atnGBydD61zrIjAWHM9+6gQlxXvcGdMK2JL7eL6wBoYnCPcSCPL1n4QOkqB+SzwpaQe+ HaNg== X-Gm-Message-State: AHPjjUgqHFcDokZFulK9X8jAgcmwE4vniBfK+9hfPgHhZaru0fPfhh1m 7HRj5Cql5487khn5Mpj0sr+CTyrKXNw= X-Google-Smtp-Source: AOwi7QBh8OJGaX3bpWKsdFGW2W6Yk64byqEbavaZ63e1Ow0TRmN+iqvMwOWnOAfpe3frH97L4LRkOQ== X-Received: by 10.202.77.201 with SMTP id a192mr6796837oib.311.1506095361367; Fri, 22 Sep 2017 08:49:21 -0700 (PDT) Received: from weathertop.attlocal.net ([2602:306:80a3:c890:201:73ff:fe02:1650]) by smtp.gmail.com with ESMTPSA id 199sm63480oie.58.2017.09.22.08.49.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 22 Sep 2017 08:49:20 -0700 (PDT) From: Jim Wilson To: gcc-patches@gcc.gnu.org Cc: Jim Wilson , wilson@tuliptree.org Subject: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor. Date: Fri, 22 Sep 2017 08:49:17 -0700 Message-Id: <1506095357-3334-1-git-send-email-jim.wilson@linaro.org> On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13% performance increase on stream copy using this patch, and a 16% performance increase on stream scale using this patch. I also see a small performance increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3. gcc/ * config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf. * config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_movti_target_operand_p): New. * config/aarch64/aarch64.md (movti_aarch64): Use Utf. (movtf_aarch64): Likewise. * config/aarch64/constraints.md (Utf): New. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 ++++ gcc/config/aarch64/aarch64.c | 14 +++++++++++++- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 6 ++++++ 6 files changed, 29 insertions(+), 6 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..2dfd057 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -379,6 +379,7 @@ const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); bool aarch64_simd_mem_operand_p (rtx); +bool aarch64_movti_target_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 70e9339..88bf210 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -133,9 +133,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Utf, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index f48642c..7d0b104 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */ +AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store", + SLOW_REGOFFSET_QUADWORD_STORE) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5e26cb7..d6f1133 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -818,7 +818,7 @@ static const struct tune_params qdf24xx_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE), /* tune_flags. */ &qdf24xx_prefetch_tune }; @@ -11821,6 +11821,18 @@ aarch64_simd_mem_operand_p (rtx op) || REG_P (XEXP (op, 0))); } +/* Return TRUE if OP uses an efficient memory address for quad-word target. */ +bool +aarch64_movti_target_operand_p (rtx op) +{ + if (! optimize_size + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)) + return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS + && ! CONST_INT_P (XEXP (XEXP (op, 0), 1))); + return MEM_P (op); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f8cdb06..9c7e356 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1023,7 +1023,7 @@ (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "=r, w,r,w,r,m,m,w,m") + "nonimmediate_operand" "=r, w,r,w,r,m,m,w,Utf") (match_operand:TI 1 "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) @@ -1170,9 +1170,9 @@ (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 - "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,Utf,?r,m ,m") (match_operand:TF 1 - "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,?r, ?r,w ,Y,Y ,m,w ,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], TFmode) || aarch64_reg_or_fp_zero (operands[1], TFmode))" "@ diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 3649fb4..b1defb6 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -171,6 +171,12 @@ (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0), PARALLEL, 1)"))) +(define_memory_constraint "Utf" + "@iternal + An efficient memory address for a quad-word target operand." + (and (match_code "mem") + (match_test "aarch64_movti_target_operand_p (op)"))) + (define_memory_constraint "Utv" "@internal An address valid for loading/storing opaque structure