diff mbox

[ARM/AArch64] Properly cost rev16 operand

Message ID 55433834.10206@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov May 1, 2015, 8:24 a.m. UTC
Hi all,

It occurs to me that in the IOR-of-shifts form of the rev16 operation we should be costing the operand properly.
For that we'd want to reuse the aarch_rev16_p function that does all the heavy lifting and get it to write the
innermost operand of the rev16 for further costing. In the process we relax that function a bit to accept any
rtx as the operand, not just REGs so that we can calculate the cost of moving them in a register appropriately.

This patch does just that and updates the arm and aarch64 callsites appropriately so that the operands are
processed properly.

In practice I don't expect this to make much difference since this patterns occurs rarely anyway, but it seems
like the 'right thing to do' (TM).

Bootstrapped and tested on arm,aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-05-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/arm/aarch-common-protos.h (aarch_rev16_p): Update signature
     to take a second argument.
     * config/arm/aarch-common.c (aarch_rev16_p): Add second argument.
     Write inner-most rev16 argument to it if recognised.
     (aarch_rev16_p_1): Likewise.
     * config/arm/arm.c (arm_new_rtx_costs): Properly cost rev16 operand
     in the IOR case.
     * config/aarch64/aarch64.c (aarch64_rtx_costs): Likewise.

Comments

Kyrylo Tkachov May 12, 2015, 9:07 a.m. UTC | #1
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00013.html

Thanks,
Kyrill
On 01/05/15 09:24, Kyrill Tkachov wrote:
> Hi all,
>
> It occurs to me that in the IOR-of-shifts form of the rev16 operation we should be costing the operand properly.
> For that we'd want to reuse the aarch_rev16_p function that does all the heavy lifting and get it to write the
> innermost operand of the rev16 for further costing. In the process we relax that function a bit to accept any
> rtx as the operand, not just REGs so that we can calculate the cost of moving them in a register appropriately.
>
> This patch does just that and updates the arm and aarch64 callsites appropriately so that the operands are
> processed properly.
>
> In practice I don't expect this to make much difference since this patterns occurs rarely anyway, but it seems
> like the 'right thing to do' (TM).
>
> Bootstrapped and tested on arm,aarch64.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-05-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       * config/arm/aarch-common-protos.h (aarch_rev16_p): Update signature
>       to take a second argument.
>       * config/arm/aarch-common.c (aarch_rev16_p): Add second argument.
>       Write inner-most rev16 argument to it if recognised.
>       (aarch_rev16_p_1): Likewise.
>       * config/arm/arm.c (arm_new_rtx_costs): Properly cost rev16 operand
>       in the IOR case.
>       * config/aarch64/aarch64.c (aarch64_rtx_costs): Likewise.
Kyrylo Tkachov May 21, 2015, 4:59 p.m. UTC | #2
Ping.

Thanks,
Kyrill
On 12/05/15 10:07, Kyrill Tkachov wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00013.html
>
> Thanks,
> Kyrill
> On 01/05/15 09:24, Kyrill Tkachov wrote:
>> Hi all,
>>
>> It occurs to me that in the IOR-of-shifts form of the rev16 operation we should be costing the operand properly.
>> For that we'd want to reuse the aarch_rev16_p function that does all the heavy lifting and get it to write the
>> innermost operand of the rev16 for further costing. In the process we relax that function a bit to accept any
>> rtx as the operand, not just REGs so that we can calculate the cost of moving them in a register appropriately.
>>
>> This patch does just that and updates the arm and aarch64 callsites appropriately so that the operands are
>> processed properly.
>>
>> In practice I don't expect this to make much difference since this patterns occurs rarely anyway, but it seems
>> like the 'right thing to do' (TM).
>>
>> Bootstrapped and tested on arm,aarch64.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2015-05-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>        * config/arm/aarch-common-protos.h (aarch_rev16_p): Update signature
>>        to take a second argument.
>>        * config/arm/aarch-common.c (aarch_rev16_p): Add second argument.
>>        Write inner-most rev16 argument to it if recognised.
>>        (aarch_rev16_p_1): Likewise.
>>        * config/arm/arm.c (arm_new_rtx_costs): Properly cost rev16 operand
>>        in the IOR case.
>>        * config/aarch64/aarch64.c (aarch64_rtx_costs): Likewise.
James Greenhalgh May 26, 2015, 8:49 a.m. UTC | #3
On Fri, May 01, 2015 at 09:24:20AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> It occurs to me that in the IOR-of-shifts form of the rev16 operation we
> should be costing the operand properly.  For that we'd want to reuse the
> aarch_rev16_p function that does all the heavy lifting and get it to write
> the innermost operand of the rev16 for further costing. In the process we
> relax that function a bit to accept any rtx as the operand, not just REGs so
> that we can calculate the cost of moving them in a register appropriately.
> 
> This patch does just that and updates the arm and aarch64 callsites
> appropriately so that the operands are processed properly.
> 
> In practice I don't expect this to make much difference since this patterns
> occurs rarely anyway, but it seems like the 'right thing to do' (TM).
> 
> Bootstrapped and tested on arm,aarch64.
> 
> Ok for trunk?

Looks OK for aarch64, please wait for an Ack from an arm maintainer.

Cheers,
James

> 2015-05-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * config/arm/aarch-common-protos.h (aarch_rev16_p): Update signature
>      to take a second argument.
>      * config/arm/aarch-common.c (aarch_rev16_p): Add second argument.
>      Write inner-most rev16 argument to it if recognised.
>      (aarch_rev16_p_1): Likewise.
>      * config/arm/arm.c (arm_new_rtx_costs): Properly cost rev16 operand
>      in the IOR case.
>      * config/aarch64/aarch64.c (aarch64_rtx_costs): Likewise.
diff mbox

Patch

commit 297534c524af2aac4c67279909c5e54221a46b22
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Mar 2 17:55:30 2015 +0000

    [ARM/AArch64] Properly cost rev16 operand.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0345b93..6083fd4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6003,9 +6003,9 @@  cost_plus:
       return false;
 
     case IOR:
-      if (aarch_rev16_p (x))
+      if (aarch_rev16_p (x, &op0))
         {
-          *cost = COSTS_N_INSNS (1);
+          *cost += rtx_cost (op0, BSWAP, 0, speed);
 
           if (speed)
             *cost += extra_cost->alu.rev;
diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 3ee7ebf..04b1f1d 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -24,7 +24,7 @@ 
 #define GCC_AARCH_COMMON_PROTOS_H
 
 extern int aarch_crypto_can_dual_issue (rtx_insn *, rtx_insn *);
-extern bool aarch_rev16_p (rtx);
+extern bool aarch_rev16_p (rtx, rtx*);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
 extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
 extern int arm_early_load_addr_dep (rtx, rtx);
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index b2bb859..f1c267c 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -181,28 +181,31 @@  aarch_rev16_shleft_mask_imm_p (rtx val, machine_mode mode)
 
 
 static bool
-aarch_rev16_p_1 (rtx lhs, rtx rhs, machine_mode mode)
+aarch_rev16_p_1 (rtx lhs, rtx rhs, machine_mode mode, rtx *op)
 {
   if (GET_CODE (lhs) == AND
          && GET_CODE (XEXP (lhs, 0)) == ASHIFT
             && CONST_INT_P (XEXP (XEXP (lhs, 0), 1))
             && INTVAL (XEXP (XEXP (lhs, 0), 1)) == 8
-            && REG_P (XEXP (XEXP (lhs, 0), 0))
          && CONST_INT_P (XEXP (lhs, 1))
       && GET_CODE (rhs) == AND
          && GET_CODE (XEXP (rhs, 0)) == LSHIFTRT
-            && REG_P (XEXP (XEXP (rhs, 0), 0))
             && CONST_INT_P (XEXP (XEXP (rhs, 0), 1))
             && INTVAL (XEXP (XEXP (rhs, 0), 1)) == 8
          && CONST_INT_P (XEXP (rhs, 1))
-      && REGNO (XEXP (XEXP (rhs, 0), 0)) == REGNO (XEXP (XEXP (lhs, 0), 0)))
+      && rtx_equal_p (XEXP (XEXP (rhs, 0), 0), XEXP (XEXP (lhs, 0), 0)))
 
     {
       rtx lhs_mask = XEXP (lhs, 1);
       rtx rhs_mask = XEXP (rhs, 1);
-
-      return aarch_rev16_shright_mask_imm_p (rhs_mask, mode)
-             && aarch_rev16_shleft_mask_imm_p (lhs_mask, mode);
+      rtx inner = XEXP (XEXP (lhs, 0), 0);
+
+      if (aarch_rev16_shright_mask_imm_p (rhs_mask, mode)
+          && aarch_rev16_shleft_mask_imm_p (lhs_mask, mode))
+        {
+          *op = inner;
+          return true;
+        }
     }
 
   return false;
@@ -214,14 +217,18 @@  aarch_rev16_p_1 (rtx lhs, rtx rhs, machine_mode mode)
    | ((x << 8) & 0xff00ff00)
    for SImode and with similar but wider bitmasks for DImode.
    The two sub-expressions of the IOR can appear on either side so check both
-   permutations with the help of aarch_rev16_p_1 above.  */
+   permutations with the help of aarch_rev16_p_1 above.  Write into OP the
+   rtx that has rev16 applied to it to be used for further operand costing.
+   OP will hold NULL_RTX if the operation is not recognised.  */
 
 bool
-aarch_rev16_p (rtx x)
+aarch_rev16_p (rtx x, rtx *op)
 {
   rtx left_sub_rtx, right_sub_rtx;
   bool is_rev = false;
 
+  *op = NULL_RTX;
+
   if (GET_CODE (x) != IOR)
     return false;
 
@@ -230,10 +237,10 @@  aarch_rev16_p (rtx x)
 
   /* There are no canonicalisation rules for the position of the two shifts
      involved in a rev, so try both permutations.  */
-  is_rev = aarch_rev16_p_1 (left_sub_rtx, right_sub_rtx, GET_MODE (x));
+  is_rev = aarch_rev16_p_1 (left_sub_rtx, right_sub_rtx, GET_MODE (x), op);
 
   if (!is_rev)
-    is_rev = aarch_rev16_p_1 (right_sub_rtx, left_sub_rtx, GET_MODE (x));
+    is_rev = aarch_rev16_p_1 (right_sub_rtx, left_sub_rtx, GET_MODE (x), op);
 
   return is_rev;
 }
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4081680..b3c65d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10389,14 +10389,19 @@  arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
       *cost = LIBCALL_COST (2);
       return false;
     case IOR:
-      if (mode == SImode && arm_arch6 && aarch_rev16_p (x))
-        {
-          *cost = COSTS_N_INSNS (1);
-          if (speed_p)
-            *cost += extra_cost->alu.rev;
+      {
+        rtx inner;
+        if (mode == SImode && arm_arch6 && aarch_rev16_p (x, &inner))
+          {
+            *cost = COSTS_N_INSNS (1);
+            *cost += rtx_cost (inner, BSWAP, 0 , speed_p);
 
-          return true;
-        }
+            if (speed_p)
+              *cost += extra_cost->alu.rev;
+
+            return true;
+          }
+      }
     /* Fall through.  */
     case AND: case XOR:
       if (mode == SImode)