From patchwork Tue Mar 25 15:52:06 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 333509 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 866B0140081 for ; Wed, 26 Mar 2014 02:52:59 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=P1AWv1kr7b2/wDNi4weU0L0k6V+539Kmeyhqjk/LkC8 +WWxcCecP+5nCBaY6WirQ2zASnKGyxgFXDRYDWTpqCMPbBso1uSclxneZZGX91xF Fvh6H+SJlN//X0paL7Sa2MUknfg0EtRzu8bP9FjIK8Z/32i5oOHVI/ObipWDTJE0 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=dDIqATgdRh3tA+4qWbd2f7Bx16g=; b=xxzfSobbC/JZrCf51 8PaCmFZQ2S6H0Bv/nfBEPyNsKLUGfGKArVrYCcQhCn1TZqavB+pH77ZQR5GSFIWt IA+R5gVETo600jmZu8ZRfHschGjA6gP01fuWEkrBpJ2oNCMzb4Jssp7zonjZzAAw En3LIpnx5MLhmO/hYvnQ9xQNu0= Received: (qmail 8758 invoked by alias); 25 Mar 2014 15:52:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8693 invoked by uid 89); 25 Mar 2014 15:52:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 25 Mar 2014 15:52:10 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 25 Mar 2014 15:52:07 +0000 Received: from [10.1.208.24] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 25 Mar 2014 15:52:21 +0000 Message-ID: <5331A626.2010306@arm.com> Date: Tue, 25 Mar 2014 15:52:06 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130804 Thunderbird/17.0.8 MIME-Version: 1.0 To: GCC Patches CC: Ramana Radhakrishnan , Richard Earnshaw , Marcus Shawcroft Subject: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description X-MC-Unique: 114032515520709001 X-IsSubscribed: yes Hi all, In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC sequences of the form: AESE Vn, _ AESMC Vn, Vn will issue both instructions in a single cycle on super-scalar implementations. It would be nice to model that in our pipeline descriptions. This patch defines a function to detect such pairs and uses it in the pipeline description for these instructions for the Cortex-A53. The patch also adds some missed AdvancedSIMD information to the pipeline description for the Cortex-A53. Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu. Cortex-A53 scheduling is the default scheduling description on aarch64 so this patch can change default behaviour. That's an argument for taking this in stage1 or maybe backporting it into 4.9.1 once the release is made. What do people think? Thanks, Kyrill 2014-03-25 Kyrylo Tkachov * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function. * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue): Declare extern. * config/arm/cortex-a53.md: Add reservations and bypass for crypto instructions as well as AdvancedSIMD loads. commit b89802221229e8ca7fac5fcd6d552392301edde0 Author: Kyrylo Tkachov Date: Mon Jan 27 11:29:44 2014 +0000 Crypto scheduling for A53 diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 056fe56..2b33626 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -23,6 +23,7 @@ #ifndef GCC_AARCH_COMMON_PROTOS_H #define GCC_AARCH_COMMON_PROTOS_H +extern int aarch_crypto_can_dual_issue (rtx, rtx); extern int arm_early_load_addr_dep (rtx, rtx); extern int arm_early_store_addr_dep (rtx, rtx); extern int arm_mac_accumulator_is_mul_result (rtx, rtx); diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index c11f7e9..17c4924 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -31,6 +31,42 @@ #include "c-family/c-common.h" #include "rtl.h" +/* In ARMv8-A there's a general expectation that AESE/AESMC + and AESD/AESIMC sequences of the form: + + AESE Vn, _ + AESMC Vn, Vn + + will issue both instructions in a single cycle on super-scalar + implementations. This function identifies such pairs. */ + +int +aarch_crypto_can_dual_issue (rtx producer, rtx consumer) +{ + rtx producer_src, consumer_src; + + producer = single_set (producer); + consumer = single_set (consumer); + + producer_src = producer ? SET_SRC (producer) : NULL; + consumer_src = consumer ? SET_SRC (consumer) : NULL; + + if (producer_src && consumer_src + && GET_CODE (producer_src) == UNSPEC && GET_CODE (consumer_src) == UNSPEC + && ((XINT (producer_src, 1) == UNSPEC_AESE + && XINT (consumer_src, 1) == UNSPEC_AESMC) + || (XINT (producer_src, 1) == UNSPEC_AESD + && XINT (consumer_src, 1) == UNSPEC_AESIMC))) + { + unsigned int regno = REGNO (SET_DEST (producer)); + + return REGNO (SET_DEST (consumer)) == regno + && REGNO (XVECEXP (consumer_src, 0, 0)) == regno; + } + + return 0; +} + typedef struct { rtx_code search_code; diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index deae8eb..b131c81 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -61,6 +61,11 @@ (define_cpu_unit "cortex_a53_fp_div_sqrt" "cortex_a53") +;; The Advanced SIMD pipelines. + +(define_cpu_unit "cortex_a53_simd0" "cortex_a53") +(define_cpu_unit "cortex_a53_simd1" "cortex_a53") + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ALU instructions. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -248,6 +253,39 @@ "cortex_a53_slot0, cortex_a53_fp_div_sqrt * 28") ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; ARMv8-A Cryptographic extensions. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_insn_reservation "cortex_a53_crypto_aese" 2 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "crypto_aese")) + "cortex_a53_simd0") + +(define_insn_reservation "cortex_a53_crypto_aesmc" 2 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "crypto_aesmc")) + "cortex_a53_simd0 | cortex_a53_simd1") + +(define_insn_reservation "cortex_a53_crypto_sha1_fast" 2 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "crypto_sha1_fast, crypto_sha256_fast")) + "cortex_a53_simd0") + +(define_insn_reservation "cortex_a53_crypto_sha1_xor" 3 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "crypto_sha1_xor")) + "cortex_a53_simd0") + +(define_insn_reservation "cortex_a53_crypto_sha_slow" 5 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "crypto_sha1_slow, crypto_sha256_slow")) + "cortex_a53_simd0") + +(define_bypass 0 "cortex_a53_crypto_aese" + "cortex_a53_crypto_aesmc" + "aarch_crypto_can_dual_issue") + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; VFP to/from core transfers. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -284,6 +322,16 @@ (eq_attr "type" "f_loadd")) "cortex_a53_slot0") +(define_insn_reservation "cortex_a53_f_load_2reg" 5 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "neon_load2_2reg_q")) + "(cortex_a53_slot_any+cortex_a53_ls)*2") + +(define_insn_reservation "cortex_a53_f_loadq" 5 + (and (eq_attr "tune" "cortexa53") + (eq_attr "type" "neon_load1_1reg_q")) + "cortex_a53_slot_any+cortex_a53_ls") + (define_insn_reservation "cortex_a53_f_stores" 0 (and (eq_attr "tune" "cortexa53") (eq_attr "type" "f_stores")) @@ -307,3 +355,11 @@ cortex_a53_fdivs, cortex_a53_fdivd,\ cortex_a53_f2r") +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Crude Advanced SIMD approximation. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_insn_reservation "cortex_53_advsimd" 4 + (and (eq_attr "tune" "cortexa53") + (eq_attr "is_neon_type" "yes")) + "cortex_a53_simd0")