Patchwork [ARM/AArch64,2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description

login
register
mail settings
Submitter Kyrylo Tkachov
Date March 25, 2014, 3:52 p.m.
Message ID <5331A626.2010306@arm.com>
Download mbox | patch
Permalink /patch/333509/
State New
Headers show

Comments

Kyrylo Tkachov - March 25, 2014, 3:52 p.m.
Hi all,

In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC 
sequences of the form:

AESE Vn, _
AESMC Vn, Vn

will issue both instructions in a single cycle on super-scalar implementations. 
It would be nice to model that in our pipeline descriptions. This patch defines 
a function to detect such pairs and uses it in the pipeline description for 
these instructions for the Cortex-A53.

The patch also adds some missed AdvancedSIMD information to the pipeline 
description for the Cortex-A53.

Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu.

Cortex-A53 scheduling is the default scheduling description on aarch64 so this 
patch can change default behaviour. That's an argument for taking this in stage1 
or maybe backporting it into 4.9.1 once the release is made.

What do people think?

Thanks,
Kyrill


2014-03-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function.
     * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue): Declare
     extern.
     * config/arm/cortex-a53.md: Add reservations and bypass for crypto
     instructions as well as AdvancedSIMD loads.
Ramana Radhakrishnan - March 28, 2014, 2:52 p.m.
On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC
> sequences of the form:
>
> AESE Vn, _
> AESMC Vn, Vn
>
> will issue both instructions in a single cycle on super-scalar
> implementations. It would be nice to model that in our pipeline
> descriptions. This patch defines a function to detect such pairs and uses it
> in the pipeline description for these instructions for the Cortex-A53.
>
> The patch also adds some missed AdvancedSIMD information to the pipeline
> description for the Cortex-A53.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf and
> aarch64-none-linux-gnu.
>
> Cortex-A53 scheduling is the default scheduling description on aarch64 so
> this patch can change default behaviour. That's an argument for taking this
> in stage1 or maybe backporting it into 4.9.1 once the release is made.


To my mind on ARM / AArch64 this actually helps anyone using the
crypto intrinsics on A53 hardware today and it would be good to get
this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
this is not a  default tuning option for any large software vendors,
the folks using this are typically the ones that write the more
specialized crypto intrinsics rather than just general purpose code.
However this will help with scheduling on what is essentially an
in-order core, so would be nice to have.

This would definitely need approval from the AArch64 maintainers and
the RMs to go in at this stage.

If not, we should consider this for 4.9.1

regards
Ramana

>
> What do people think?
>
> Thanks,
> Kyrill
>
>
> 2014-03-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function.
>     * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue):
> Declare
>     extern.
>     * config/arm/cortex-a53.md: Add reservations and bypass for crypto
>     instructions as well as AdvancedSIMD loads.
Richard Guenther - March 28, 2014, 2:57 p.m.
On Fri, 28 Mar 2014, Ramana Radhakrishnan wrote:

> On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> > Hi all,
> >
> > In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC
> > sequences of the form:
> >
> > AESE Vn, _
> > AESMC Vn, Vn
> >
> > will issue both instructions in a single cycle on super-scalar
> > implementations. It would be nice to model that in our pipeline
> > descriptions. This patch defines a function to detect such pairs and uses it
> > in the pipeline description for these instructions for the Cortex-A53.
> >
> > The patch also adds some missed AdvancedSIMD information to the pipeline
> > description for the Cortex-A53.
> >
> > Bootstrapped and tested on arm-none-linux-gnueabihf and
> > aarch64-none-linux-gnu.
> >
> > Cortex-A53 scheduling is the default scheduling description on aarch64 so
> > this patch can change default behaviour. That's an argument for taking this
> > in stage1 or maybe backporting it into 4.9.1 once the release is made.
> 
> 
> To my mind on ARM / AArch64 this actually helps anyone using the
> crypto intrinsics on A53 hardware today and it would be good to get
> this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
> this is not a  default tuning option for any large software vendors,
> the folks using this are typically the ones that write the more
> specialized crypto intrinsics rather than just general purpose code.
> However this will help with scheduling on what is essentially an
> in-order core, so would be nice to have.
> 
> This would definitely need approval from the AArch64 maintainers and
> the RMs to go in at this stage.
> 
> If not, we should consider this for 4.9.1

I'd rather have it in 4.9.0 than 4.9.1.

Richard.

> regards
> Ramana
> 
> >
> > What do people think?
> >
> > Thanks,
> > Kyrill
> >
> >
> > 2014-03-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> >
> >     * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function.
> >     * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue):
> > Declare
> >     extern.
> >     * config/arm/cortex-a53.md: Add reservations and bypass for crypto
> >     instructions as well as AdvancedSIMD loads.
> 
>
Marcus Shawcroft - March 28, 2014, 4:52 p.m.
On 28 March 2014 14:52, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:

> To my mind on ARM / AArch64 this actually helps anyone using the
> crypto intrinsics on A53 hardware today and it would be good to get
> this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
> this is not a  default tuning option for any large software vendors,
> the folks using this are typically the ones that write the more
> specialized crypto intrinsics rather than just general purpose code.
> However this will help with scheduling on what is essentially an
> in-order core, so would be nice to have.
>
> This would definitely need approval from the AArch64 maintainers and
> the RMs to go in at this stage.
>
> If not, we should consider this for 4.9.1
>
> regards
> Ramana
>
>>
>> What do people think?

Its low risk, I think it should go in now.

/Marcus
Jakub Jelinek - March 28, 2014, 5:18 p.m.
On Fri, Mar 28, 2014 at 04:52:39PM +0000, Marcus Shawcroft wrote:
> On 28 March 2014 14:52, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> 
> > To my mind on ARM / AArch64 this actually helps anyone using the
> > crypto intrinsics on A53 hardware today and it would be good to get
> > this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
> > this is not a  default tuning option for any large software vendors,
> > the folks using this are typically the ones that write the more
> > specialized crypto intrinsics rather than just general purpose code.
> > However this will help with scheduling on what is essentially an
> > in-order core, so would be nice to have.
> >
> > This would definitely need approval from the AArch64 maintainers and
> > the RMs to go in at this stage.
> >
> > If not, we should consider this for 4.9.1
> >
> > regards
> > Ramana
> >
> >>
> >> What do people think?
> 
> Its low risk, I think it should go in now.

Ok with me.

	Jakub

Patch

commit b89802221229e8ca7fac5fcd6d552392301edde0
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Jan 27 11:29:44 2014 +0000

     Crypto scheduling for A53

diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 056fe56..2b33626 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -23,6 +23,7 @@ 
 #ifndef GCC_AARCH_COMMON_PROTOS_H
 #define GCC_AARCH_COMMON_PROTOS_H
 
+extern int aarch_crypto_can_dual_issue (rtx, rtx);
 extern int arm_early_load_addr_dep (rtx, rtx);
 extern int arm_early_store_addr_dep (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index c11f7e9..17c4924 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -31,6 +31,42 @@ 
 #include "c-family/c-common.h"
 #include "rtl.h"
 
+/* In ARMv8-A there's a general expectation that AESE/AESMC
+   and AESD/AESIMC sequences of the form:
+
+   AESE Vn, _
+   AESMC Vn, Vn
+
+   will issue both instructions in a single cycle on super-scalar
+   implementations.  This function identifies such pairs.  */
+
+int
+aarch_crypto_can_dual_issue (rtx producer, rtx consumer)
+{
+  rtx producer_src, consumer_src;
+
+  producer = single_set (producer);
+  consumer = single_set (consumer);
+
+  producer_src = producer ? SET_SRC (producer) : NULL;
+  consumer_src = consumer ? SET_SRC (consumer) : NULL;
+
+  if (producer_src && consumer_src
+      && GET_CODE (producer_src) == UNSPEC && GET_CODE (consumer_src) == UNSPEC
+      && ((XINT (producer_src, 1) == UNSPEC_AESE
+           && XINT (consumer_src, 1) == UNSPEC_AESMC)
+          || (XINT (producer_src, 1) == UNSPEC_AESD
+              && XINT (consumer_src, 1) == UNSPEC_AESIMC)))
+  {
+    unsigned int regno = REGNO (SET_DEST (producer));
+
+    return REGNO (SET_DEST (consumer)) == regno
+           && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
+  }
+
+  return 0;
+}
+
 typedef struct
 {
   rtx_code search_code;
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index deae8eb..b131c81 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -61,6 +61,11 @@ 
 
 (define_cpu_unit "cortex_a53_fp_div_sqrt" "cortex_a53")
 
+;; The Advanced SIMD pipelines.
+
+(define_cpu_unit "cortex_a53_simd0" "cortex_a53")
+(define_cpu_unit "cortex_a53_simd1" "cortex_a53")
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;; ALU instructions.
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -248,6 +253,39 @@ 
   "cortex_a53_slot0, cortex_a53_fp_div_sqrt * 28")
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; ARMv8-A Cryptographic extensions.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn_reservation "cortex_a53_crypto_aese" 2
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "crypto_aese"))
+  "cortex_a53_simd0")
+
+(define_insn_reservation "cortex_a53_crypto_aesmc" 2
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "crypto_aesmc"))
+  "cortex_a53_simd0 | cortex_a53_simd1")
+
+(define_insn_reservation "cortex_a53_crypto_sha1_fast" 2
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "crypto_sha1_fast, crypto_sha256_fast"))
+  "cortex_a53_simd0")
+
+(define_insn_reservation "cortex_a53_crypto_sha1_xor" 3
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "crypto_sha1_xor"))
+  "cortex_a53_simd0")
+
+(define_insn_reservation "cortex_a53_crypto_sha_slow" 5
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "crypto_sha1_slow, crypto_sha256_slow"))
+  "cortex_a53_simd0")
+
+(define_bypass 0 "cortex_a53_crypto_aese"
+                 "cortex_a53_crypto_aesmc"
+                 "aarch_crypto_can_dual_issue")
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;; VFP to/from core transfers.
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
@@ -284,6 +322,16 @@ 
        (eq_attr "type" "f_loadd"))
   "cortex_a53_slot0")
 
+(define_insn_reservation "cortex_a53_f_load_2reg" 5
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "neon_load2_2reg_q"))
+  "(cortex_a53_slot_any+cortex_a53_ls)*2")
+
+(define_insn_reservation "cortex_a53_f_loadq" 5
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "type" "neon_load1_1reg_q"))
+  "cortex_a53_slot_any+cortex_a53_ls")
+
 (define_insn_reservation "cortex_a53_f_stores" 0
   (and (eq_attr "tune" "cortexa53")
        (eq_attr "type" "f_stores"))
@@ -307,3 +355,11 @@ 
 		  cortex_a53_fdivs, cortex_a53_fdivd,\
 		  cortex_a53_f2r")
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;; Crude Advanced SIMD approximation.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_insn_reservation "cortex_53_advsimd" 4
+  (and (eq_attr "tune" "cortexa53")
+       (eq_attr "is_neon_type" "yes"))
+  "cortex_a53_simd0")